Simple explanation of a t-test and its meaning using Excel 2013

For those of us you say that stats are ‘dry’ – you are clearly being shown the wrong numbers! Statistics are everywhere and it’s well worth understanding what you’re talking about, when you talk about data, numbers or languages such as R or Python for data and number crunching. Statistics knowledge is extremely useful, and it is accessible when you put your brain to it!

So, for example, what does a pint of Guinness teaches us about statistics? In a visit to Ireland, Barack Obama declared that the Guinness tasted better in Ireland, and the Irish keep the ‘good stuff’ to themselves.
Can we use science to identify whether there is a difference between the enjoyment of a pint of Guinness consumed in Ireland, and pints consumed outside of Ireland?
A Journal of Food Science investigation detailed research where four investigators travelled around different countries in order to taste-test the Guinness poured inside and outside of Ireland. To do this, researchers compared the average score of pints poured inside of Ireland versus pints poured outside of Ireland.

How did they do the comparison? they used a t-test, which was devised by William Searly Gosset, who worked at the Guinness factory as a scientist, with the objective of using science to produce the perfect pint.
The t-test helps us to work out whether two sets of data are actually different.
It takes two sets of data, and calculates:

the count – the number of data points
the mean, also known as the average i.e. the sum total of the data, divided by the number of data points
The standard deviation – tells you roughly how far, on average, each number in your list of data varies from the average value of the list itself.

The t-test is more sophisticated test to tell us if those two means
of those two groups of data are different.

In the video, I go ahead and try it, using Excel formulas:

COUNT – count up the number of data points. The formula is simply COUNT, and we calculate this
AVERAGE – This is calculated using the Average Excel calculated formula.
STDEV – again, another Excel calculation, to tell us the standard deviation.
TTEST – the Excel calculation, which wants to know:

Array 1 – your first group of data
Array 2 – your second group of data
Tail – do you know if the mean of the second group will definitely by higher OR lower than the second group, and it’s only likely to go in that direction? If so, use 1. If you are not sure if it’s higher or lower, then use 2.
Type –
if your data points are related to one another in each group, use 1
if the data points in each group are unrelated to each other, and there is equal variances, use 2
if the data points in each group are unrelated to each other, and if you are unsure if there are equal variances, use 3

And that’s your result. But how do you know what it means? It produces a number, called p, which is simply the probability.

The t-test: simple way of establishing whether there are significant differences between two groups of data. It uses the null hypothesis: this is the devil’s advocate position, which says that there is no difference between the two groups It uses the sample size, the mean, and the standard deviation to produce a p value.
The lower the p value, the more likely that there is a difference in the two groups i.e. that something happened to make a difference in the two groups.

In social science, p is usually set to 5% i.e. only 1 time in 20, is the difference due to chance.
In the video, the first comparison does not have a difference ‘proven, but the second comparison does.
So next time you have a good pint of Guinness, raise your glass to statistics!

 


 

Love,
Jen

Jen’s PASS Diary: SQLSaturday Edinburgh: My heartfelt thanks go to…

SQLSaturday Edinburgh went ahead last Saturday, June 13th, and everyone had a great day. It’s clear that people in the community believe in what I am doing. They voted with their feet to attend, to speak, and to sponsor. We had high quality speakers delivering world-class content – 8 MVPs, 2 Microsoft staff, and the remainder are international speakers – and we know that Content is King.

Basically, SQLSaturday Edinburgh Business Intelligence edition was the turning point for the Business Analytics and Business Intelligence ( SQL Server based ) community in the UK.

  • Our event only had five people who had spoken at SQLBits (Carmel Gunn, Bob Duffy, Gary Short, Chris Webb and Satya Jayanty).
  • three of our speakers (Mark Wilcock, Chris Webb and Bob Phillips) all spoke at PASS Business Analytics Conference last April in San Jose, and they all spoke at PASS SQLSaturday London Business Analytics in November 2014.
  • The other speakers have delivered sessions internationally in their field of expertise: Visio, SharePoint, CRM, and this was the first time they’d spoken at a SQL event.

We tried to be more BI and BA focused, and did it work? The feedback so far is a resounding YES. We didn’t try to squeeze the formula for other SQL events onto this one, jam some R in there, and announce it as an analytics event. The content was focused on what we do with data, why, and what the business value is.There will be more on this in future posts. In the meantime, however, I have a lot of thank yous!

I also want to say a heartfelt thank you to the volunteers, without whom, the event would not have happened.

  • Malcolm Smith
  • Izabela Borzecka
  • Robert French
  • Melissa Coates ( Twitter ) who helped by collating templates from her events to use.
  • Prathy Kamasani ( Twitter ) who is just simply amazing. Her smile lifts me and she has really helped to keep me going with her sunny attitude and unfailing support.
  • Rodney Kidd ( Twitter ) has been a rock and a great listener, as well as a helpful, kind gentleman.

Prathy, Rodney – I cannot thank you enough, and your friendship and support will stay with me forever. Thank you.

I want to thank the following sponsors for putting themselves forward to support me in what I’m doing for the Business Intelligence and emerging Business Analytics community in the UK. Without them, there would be no event. Fact.

SQLSaturday Edinburgh 388 Sponsors

I also want to thank our amazing SQLSaturday speakers. If you’d like to download their slides, you will find them on the site.

The speakers were, in order of appearance:

Jon Woodward ( Twitter / Website )

Iain Elder (Twitter)

David Parker ( Website )

Chris Webb ( Website / Twitter )

Ian MacDonald ( Website )

Adam Vero ( Website )

Bob Duffy ( Website / Twitter )

Carmel Gunn ( Website / Twitter )

Peyman Blumstengel ( Website )

Murali Nagaraj ( Website )

Peter Baddeley ( Website / Twitter )

Tom Sykes ( Twitter )

Niall MacLeod ( Website / Twitter )

Mark Wilcock ( Website / Twitter )

Bob Phillips ( Twitter )

Dave Lawrence ( Website / Twitter )

Tim Jones ( Website / Twitter )

Jean-Pierre Riehl ( Twitter / Website )

Gary Short ( Twitter / Website )

Satya Jayanty ( Twitter / Website )

Ric Howe ( Twitter / Website )

If I have missed anyone, it will be a genuine oversight due to a very tired little Jen missing things out!

I owe people emails so please forgive me until I catch my breath! Please bear with me. I’m doing my best.

Love always,

Jen Stirrup

David McCandless announced as PASS BAC keynote speaker

The Professional Association of SQL Server announced this morning that acclaimed data visualization expert, TedTalk speaker, and Information Is Beautiful author David McCandless will keynote at the 2nd annual PASS Business Analytics Conference in San Jose, CA, May 7-9.

 
David will take the stage on Day 2 for a journey through the world of visualizing facts, data, ideas, and statistics. Microsoft’s Kamal Hathi and Amir Netz will kick off the conference on Day 1 with insights into how Microsoft is making data more accessible through easy-to-use tools including Power BI. Exciting times, people!
 
The PASS BA Conference – featuring 65+ sessions across five topic tracks – brings together business analysts, data scientists, and BI and IT professionals to connect with each other, share experiences, and learn more about the power of data to transform business. You can find all the details at http://passbaconference.com. On a minor note, I’m speaking too during a general session (not the keynote, obviously!) and I hope to see you there!
 
I am REALLY looking forward to David McCandless speak, and it would be wonderful to meet him in person. I hope you can make it to PASS BAC. If you are looking for a discount code, here is mine for you to use to get $150 off: BASF2O
 

PASS Business Intelligence and Data Warehousing Virtual Chapter have a Surface Pro to give away to one of our lucky attendees!

The PASS Business Intelligence and Data Warehousing Virtual Chapter have a Surface Pro to give away to one of our lucky attendees!

Who are we? Take a look at bi.sqlpass.org We provide free Business Intelligence and Data Warehousing webinars to PASS community members. Free to sign up, free to listen, free to learn, share and network. 

At last year’s PASS Summit, the BI VC had more registrations than any other VCs. That makes our attendees pretty awesome! We’d like to reward your loyalty by offering a prize, kindly offered by PASS in recognition of the great participation that we see from our members.
What you need to do to enter the Surface Pro giveaway…

1.    You should send a short email by 20thJanuary to passdwbivc@sqlpass.orgwith some feedback about the BI VC. Tell us whatever you like about the BI VC and what we could do to improve it.  Sample questions you could think of answering: What do you like? What don’t you like? What would you like to see more of?
2.    Attend one of the three webinars, and you can see the details in this month’s email.

Terms and Conditions
We are not responsible for Entries that we do not receive for any reason, or for Entries that we receive but are not decipherable for any reason. The winner will be chosen randomly from the attendees who fulfil the criteria mentioned earlier.
We will automatically disqualify:
• Any incomplete or illegible Entry; and
• Any Entry that contains Inappropriate content
• Any Entries that we receive from you that are received after the closing date of the 20th January


The winner will be announced in our next newsletter for February. Good luck, everyone! 

How do you choose the right data visualisation in Power BI to show your data?

How do you choose the right visualisation to show your data? Usually the customer wants one thing, the business user want something else, the business sponsor wants something flashy…. and it’s hard to tease out the requirements, and that’s before you’ve even opened up Power BI such as Power View, Excel, Tableau or whatever your preferred data visualisation software.

In other words, there are simply too many charts to choose from, and too many requirements to meet. Where do you start?

I found this fantastic diagram which can help you to choose the right visualisation. I’m often surprised to see that people haven’t seen this before. Note: this diagram was done by Andrew Abela of Extreme Presentation and the source is here and his email address is on the slide, so be sure to thank him if you’ve found it useful. If you can’t see it very well, click here to go to the source.

choosing-a-good-chart-09_001

Chart Choosers should not replace common sense, however, and Naomi Robbins has written a nice piece here which is aimed at the wary. However, diagrams like Abela’s can really help a novice to get started, and for that, I’d like to thank him for his work.

How does it related to Microsoft’s Power BI? If you look at the visualisations that are available in Power View, you can see that most of the visualisations in the diagram are available in Power BI.  The ones that are excluded are the 3D graphs, circular area charts, variable width charts, or the waterfall chart.

Why no 3D? I personally hope that Microsoft will leave 3D out of Power BI tools, unless of course it is in Power Map.  With 3D on a chart, it is harder to identify the endpoints, and it can take us longer. It might also mean that points are occluded. If you’re interested and want to see examples, here is one by the Consultant Journal team or you can go ahead and read Stephen Few’s work. If you haven’t read anything by Stephen Few, get yourself over to his site right now. You won’t regret it. Why is it different from Power Map? 3D maps provide context, and they are the exception where I will use 3D for a data visualisation showing business data. I’m obviously excluding other types of non-business data here, such as medical imaging and so on.

Why no circular area or variable width charts? I am not a fan of variable width of circular area because we aren’t very good at evaluating area when we look at charts and graphs, and Robert Kosara has an old-but-good post on this topic here.

This blog is mainly for me to remember stuff but I hope it helps someone out there too.

Best Wishes,
Jen

The Accidental Business Intelligence Project Manager

My slide deck from my ‘Accidental Project Manager’ PASS BI VC Chapter are below.

I want to apologise that you couldn’t see the slide deck. I have no idea why that was. GoToWebinar was showing that I was ‘on air’ and only gave me the option to ‘stop showing screen’. I am bewildered and very apologetic.
I also want to say thank you to Julie Koesmarno, who bravely called me up to say that I had a screen problem. Thank you Julie! Incidentally I was very glad that Julie won Outstanding PASS Volunteer I am sure you’ll agree with me that she deserves it.
Also a big Thank You to everyone who attended. Despite the technical difficulties seeing the screen, not one person dropped off. 
As an aside, I am working on a template Project Plan, created in Microsoft Project, that i am looking to share with the PASS community. Would this be helpful to you? If so, please let me know in the comments box and I will make sure that you get a copy. It’s a template that I use myself at Copper Blue Consulting and whilst it is not ‘one size fits all’ for every project, it will give you a bit of a start.
Thanks again!
Regards,
Jen