Simple explanation of a t-test and its meaning using Excel 2013

24 Aug

For those of us you say that stats are ‘dry’ – you are clearly being shown the wrong numbers! Statistics are everywhere and it’s well worth understanding what you’re talking about, when you talk about data, numbers or languages such as R or Python for data and number crunching. Statistics knowledge is extremely useful, and it is accessible when you put your brain to it!

So, for example, what does a pint of Guinness teaches us about statistics? In a visit to Ireland, Barack Obama declared that the Guinness tasted better in Ireland, and the Irish keep the ‘good stuff’ to themselves.
Can we use science to identify whether there is a difference between the enjoyment of a pint of Guinness consumed in Ireland, and pints consumed outside of Ireland?
A Journal of Food Science investigation detailed research where four investigators travelled around different countries in order to taste-test the Guinness poured inside and outside of Ireland. To do this, researchers compared the average score of pints poured inside of Ireland versus pints poured outside of Ireland.

How did they do the comparison? they used a t-test, which was devised by William Searly Gosset, who worked at the Guinness factory as a scientist, with the objective of using science to produce the perfect pint.
The t-test helps us to work out whether two sets of data are actually different.
It takes two sets of data, and calculates:

the count – the number of data points
the mean, also known as the average i.e. the sum total of the data, divided by the number of data points
The standard deviation – tells you roughly how far, on average, each number in your list of data varies from the average value of the list itself.

The t-test is more sophisticated test to tell us if those two means
of those two groups of data are different.

In the video, I go ahead and try it, using Excel formulas:

COUNT – count up the number of data points. The formula is simply COUNT, and we calculate this
AVERAGE – This is calculated using the Average Excel calculated formula.
STDEV – again, another Excel calculation, to tell us the standard deviation.
TTEST – the Excel calculation, which wants to know:

Array 1 – your first group of data
Array 2 – your second group of data
Tail – do you know if the mean of the second group will definitely by higher OR lower than the second group, and it’s only likely to go in that direction? If so, use 1. If you are not sure if it’s higher or lower, then use 2.
Type –
if your data points are related to one another in each group, use 1
if the data points in each group are unrelated to each other, and there is equal variances, use 2
if the data points in each group are unrelated to each other, and if you are unsure if there are equal variances, use 3

And that’s your result. But how do you know what it means? It produces a number, called p, which is simply the probability.

The t-test: simple way of establishing whether there are significant differences between two groups of data. It uses the null hypothesis: this is the devil’s advocate position, which says that there is no difference between the two groups It uses the sample size, the mean, and the standard deviation to produce a p value.
The lower the p value, the more likely that there is a difference in the two groups i.e. that something happened to make a difference in the two groups.

In social science, p is usually set to 5% i.e. only 1 time in 20, is the difference due to chance.
In the video, the first comparison does not have a difference ‘proven, but the second comparison does.
So next time you have a good pint of Guinness, raise your glass to statistics!

 


 

Love,
Jen

Roundup of online Windows 10 Resources for IT Professionals and Developers

17 Aug

Here’s a handy list of online videos to help you to get on your way with Windows 10.

Enjoy!

IT Professionals

Getting Started with Windows 10 for IT Professionals

I’d also look on Channel 9, and you can find a great start on Windows 10 at Channel 9 here.

Developers

What’s new for Windows 10

Getting Started with Windows 10 for developers

Designing with Windows 10, for developers

Developing with Windows 10

Publishing your app with Windows 10


sMy handy toolkit for my Azure IoT Project – how the Microsoft Partner Network can help

15 Aug

In this series, I’m writing a bunch of very practical posts on helping you through an IoT project. There are plenty of other posts about the ‘why’ and the marketing buzz, but this is about the ‘do’.

If you are using Azure, the chances are that you might be a Microsoft Partner already. There are some useful goodies in there, and you may not be aware of these opportunities. The benefits of the Microsoft Partner network can be found here. However, it can be hard to relate the list to actual projects, and this blog is aimed at translating these benefits into something tangible that can help you on your IoT project. Firstly, though, take a look at the Action Pack subscription video in order to get some background:

How can this help you to start on your IoT project? Well, if you are starting out on IoT and Azure, then the first thing you’ll need are some handy free Azure credits. Now, if you have an MSDN subscription, then you will also have free Azure credits. Did you know that you can get free credits as part of your Microsoft Partner Action Pack subscription as well? Members of the Microsoft Action Pack program receive monthly credits of £65 of Azure at no charge, and the terms and conditions can be found here.

In practice, these means that you can set up two subscriptions for your Azure account; one for MSDN, and the other for your Microsoft Partner Azure credits.

To help you start out on your Azure project with IoT, you can get five internal use licenses for Office365. This is extremely useful, because it means you can download the Office software. So, in my projects, I recommend my customers become a partner with the Action Pack subscription since they will get one the following:

  • Microsoft Office 365—either five seats Office on-premises and five Microsoft Office 365, or 10 seats Office on-premises. You can earn more seats of Office 365 after an additional cloud sale.
  • Microsoft Dynamics CRM—no Microsoft Dynamics CRM Online licenses are granted at the subscription point. These licenses are granted after you close one Microsoft Dynamics CRM Online deal or at least 50 seats of Office 365 in the previous 12 months.

For your IoT project, the first option is particularly useful in the following scenarios:

If you have taken on new team members to do an AzureML project, then you are going to need Office software such as Excel, in order to view data. If people are choosing a career in AzureML, then you can make a safe bet that these team members will want to use the latest and greatest technology. This means that giving them Excel 2007, for example, isn’t going to work. Happy team members produce better results, and it’s important to empower them with the tools that they need, and *want*, to do the a job that they are proud of doing.

If you have Office365, then you can hook up your data nicely so that you can see and share it in Power BI.

  • What is your call to action?
  • Sign up for the Microsoft Partner, and enrol for the cloud Programs
  • Sign up as an Action Pack Subscriber
  • Make sure to look at your benefits, and you’ll see the Azure subscription credits and your Office365 licence keys. To do this, go to Resources, and then look for ‘Access my software and cloud benefits’.

Using my Partner Azure credits, and my MSDN credits means that I have two separate subscriptions for paying for Azure. In my case, I have a subscription for my own Virtual Machines for development, and then a different Subscription for my Proof of Concept work and the portfolio I’m building for demonstrations. It helps me to keep an eye on how much credit I’m “spending” on development work on Azure VMs for development work. At the moment, I have a few physical servers which I *used* to use for development, but I like the portability of having everything in the cloud. It will mean I don’t have to lug my heavy Dell mobile workstation around with me. For demonstrations, I can video my demos in advance in case I can’t access the cloud for some reason. If I find I’m incurring a lot of Azure credits and paying money, then I need to decide whether to purchase another physical machine, or stick with Azure. So far, Azure is winning on cost, and on factors such as performance and reliability, and ease of use. Running a small business and being on the PASS Board mean that I’m incredibly busy, and I need to be careful how I spend my time and effort. As you can understand, doing a lot of tech support may not be the best use of my time – even though I do enjoy it!

Now, you’re ready to move to the next step! There are a range of choices for architecting an IoT project, and I will talk about some of these issues in my next blog post.

My handy toolkit for my Azure IoT Project – Starting Out

5 Aug Azure-IoT

Azure-IoTAs some of you know, I’m part of an Internet of Things project (IoT). IoT is the latest buzzword, but honestly, in my opinion, it is all just data. The data may have a different velocity, and it may be fired at you in different shapes. A lot of the problems are still the same; how to store it, how to clean it, how to interpret it and analyse it.

The added complexity for me in IoT, from the perspective of a Business Intelligence specialist, is that I am not familiar with devices or any of the communication stacks for transferring data. I am learning very fast, and I’m glad to say that I’m surrounded by a great team who explain the details very clearly. I am learning about all sorts, from the details of electrical engineering through to protocols. I am enjoying the challenge. I have learned a lot, and I’ll try to share through my blog as the journey progresses.

In the meantime, I thought I’d share some of the tools and IP that I’ve been garnering as I go along this journey. This page is likely to be updated as we go along, so please keep checking back. Please note that I am not affiliated with any of these vendors in any way, and if you find other tools, please do share them with me.

Management of Azure for my IoT project

For the purposes of managing Azure storage, applications and diagnostics, I use the Azure Management Studio by Cerebrata Software, which is ultimately owned by Red Gate Software. Why do I like it? Well, you can find a detailed list of the features here but here’s why I like using it in practice:

  • I find that the interface is clean and crisp, and I can navigate it easily. I don’t have to think about using the technology, to get it to do what I want.
  • In an IoT project, particularly during the research phase, you’re not really sure how much data is being emitted. It may be more or less than predicted. With the Azure Management Studio, I can keep an eye on my storage – and therefore my costs.
  • With any project which involves early adoption of new technology, it’s important that key stakeholders are reassured about the performance and reliability of the technology. The Azure Management Studio has a series of dashboards, and this helps me to tell the story of Azure to key stakeholders.

Management of IoT Projects

I prefer to use Microsoft Project to manage projects. I have built a default IoT Project Plan using Microsoft Project, and I tailor it for each project that I’m involved in. I use Project Online with Office365 because I like having everything in one place, and it is easy to share it. I use Excel to list out tasks for people who don’t have Project, or don’t know how to use it. If people are interested, I can share the files.

If you don’t have access to Microsoft Project, then I recommend Zoho Projects. Quite frankly it’s an undiscovered gem. It’s free for one project, or you can pay $20 per month for 20 projects. This is cloud software at it’s best; functional, cheap, flexible and pay as you go. At that price, you’d be crazy not to try at least the free version.

I also use Trello and Wunderlist to manage tasks: I use Trello because the other team members seem to like it. I use Wunderlist more as a brain dumping ground, and I don’t share that with anyone.

I also like JIRA to log bugs, workflows and so on. I’ve been using the online version for years, and there is really no substitute for it.

If you’re thinking of starting an IoT project, or want to know more, then please email us at IoT@DataRelish.com and I will see if I can help you, or I can perhaps put you in touch with other people who can help.

Upcoming Microsoft Azure webinars on Azure Machine Learning and Cortana Analytics

31 Jul

I found these webinars over at the Microsoft site, and I’m reposting them here for you:

Introduction to Azure Data Factory with Wee Hyong Tok, Senior Program Manager at Microsoft
August 4, 2015 at 10am PDT 

This webinar is held by Microsoft, and I recommend you tune in if you want to learn more about Azure Data Factory. It enables you to process on-premises data like SQL Server, together with cloud data like Azure SQL Database, Blobs, and Tables. Wee Hyong Tok will help you to understand Data Factory capabilities, and the scenarios where Data Factory can be applied. Click here to register.

If you want to translate the time for this webinar into your own timezone, please click here.

Harness Predictive Customer Churn Models with Cortana Analytics Suite with Wee Hyong Tok, Senior Program Manager at Microsoft
August 18, 2015 at 10am PDT

This webinar is held by Microsoft, and I will be tuning in so I can drink all the good Cortana Analytics goodness!

In this session, Wee Hyong Tok will show you how to build a real-life churn model with Azure Machine Learning, make it enterprise-ready with Azure Data Factory, and deliver data insights with Power BI. Click here to register.

If you want to translate the time for this webinar into your own timezone, please click here.

Click on the image for the original Cortana announcement at WPC15.

Convincing your HiPPO at EARL Conference in London!

29 Jul ID-100257254

ID-100257254I’m delighted to be speaking at the EARL Conference to be held in London on the 14th – 16th September. What’s my topic?

Convince your HiPPO with Real world Data Storytelling in R and Machine Learning

In a world where the HiPPO’s (Highest Paid Person’s Opinion) is final, how can we use technology to drive the organisation towards data-driven decision making as part of their organizational DNA? R provides a range of functionality in machine learning, but we need to expose its richness in a world where it is made accessible to decision makers. Using Data Storytelling with R, we can imprint data in the culture of the organization by making it easily accessible to everyone, including decision makers. Together, the insights and process of machine learning are combined with data visualisation to help organisations derive value and insights from big and little data.

In this session, we will use R and cloud-based technologies in order to explore and analyse data using machine learning and statistical packages functionality, and we will look at our results. Then, we will look at how we disseminate the results to the HiPPO audience, using best practices in data visualisation and R, informed by gurus such as Stephen Few and Edward Tufte.

If you want to know how to demystify R and the insights you’ve found during your analyses, join this session in order to learn about machine learning as a technology and a discipline, and how to make the most of your insights using best practice data visualisation. Using real-life scenarios, this session will help you to communicate the insights of your data to your HiPPO, thereby helping to move your organisation towards a data-driven culture.

R now ranks as the sixth most popular programming language  – its move from last year’s 9th place reflecting the growing importance of data analytics to an increasing number of industries and sectors.  EARL offers a unique opportunity to discover how R is being used commercially to provide a wealth of business solutions.

EARL London will feature :

  • Presentations from over 40 R gurus and Business leaders
  • Speakers represent a broad range of industry sectors  – including:  insurance, manufacturing, customer analytics, life sciences, finance etc
  • Sessions include:  Data Visualisation, Business Challenges, Big Data Technologies, Modelling, Workflow and Commercial Applications
  • Keynote speakers:  Alex Bellow, Dirk Eddelbuettel, Joe Cheng and Hannah Fry
  • Speakers representing Companies such as Shell, KPMG, AstraZeneca, Lloyd’s of London, UBS and Hewlett Packard
  • Pre Conference workshops on:  Interactive Reporting with R Markdown and Shiny (now SOLD OUT), An Introduction to Rcpp,  Integrating R and Python, and Current Best Practices in Formal Package Development
  • Sponsors and Exhibitors including Revolution Analytics, RStudio, Hewlett Packard, Teradata, Oracle, Harnham UK, Plotly, Tessella, Information Builders and O’Reilly
  • Sensational central London venue
  • 3 Conference Networking events including the Main Conference Reception in the amazing walkways of London’s iconic Tower Bridge

If you have yet to purchase your ticket please don’t delay to avoid disappointment. Tickets can be purchased online via credit card or can be invoiced if required. Group discounts are available to companies sending >5 attendees – please email earl-team@mango-solutions.com for more information.

Cortana Analytics News Roundup

27 Jul

If you found missed Joseph Sirosh’s webinar on Cortana Analytics, Click here to view the webinar. If you don’t have time for that, then Click here to download a copy of the slides that were presented.

If you are interested in a review by an industry analyst, I’d recommend Andrew Brust’s article on Cortana from ZDNet. Mary-Jo Foleys’ original ZDNet article on the announcement from WPC is also a good read.

If you are interested in in-person events, then be sure to sign up for the Cortana Analytics Workshop planned September 10-11, 2015 at Microsoft Campus in Redmond. I wish I could attend, but unfortunately that’s not possible.

Also keep a watch on the Machine Learning Blog site, where the team will be publishing out more information on an ongoing basis.

Follow

Get every new post delivered to your Inbox.

Join 8,289 other followers