#MSBuild Azure Open Datasets

Nearly three years ago, I complained bitterly about the demise of Windows Datamarket, which aimed to provide free, stock datasets for any and every purpose. I was a huge fan of the date dimension and  the geography dimension, since they really helped me to get started with data warehousing.

So I’m glad to say that the concept is back, revamped and rebuilt for the data scientists today. Azure Open Datasets will be useful to anyone who wants data for any reason: perhaps for learning, for demos, for improving machine learning accuracy, perhaps.

The purpose of Azure Open Datasets is to save data scientists time and increase productivity by saving the time normally spent on both data discovery and
preparation.

Azure Open Datasets is available in preview, so why not take a look? Datasets are cohosted with cloud compute in Azure making access and manipulation easier. They are contained in Notebooks, which is even better!

To learn more, join my Build session on Wednesday 8th May at 2pm in the Sheraton Grand Ballroom D. I’ll show you how it works.

Please head over and give it a try. I’m happy to see the concept is back.

European Data Science & AI Awards 2019 Entry details

The European DatSci & AI Awards in collaboration with the BDVA & CeADAR are now accepting entries! 

 I’m delighted to announce that I am on the Judging Panel this year, along with people that I admire.

10 Awards to compete in The European DatSci _ AI Awards 2019

The Awards recognizes the gold standard for Data Science & AI in Industry, Education and Social Responsibility and connects the Data Science community across Europe.

The competition is open teams and individuals within the Data Science and AI community from across Europe.

Entry deadline for the competition is on the 24th May 2019.The competition is free to enter, so check out the criteria here and feel free to share. If you are interested in learning more sign up to the DatSci mailing list.

Check out the 2019 Categories

  • Data Scientist of the Year
  • Data Science Technology Innovation of the Year
  • Best Application of AI of the Year
  • Best use of Data Science/AI for Customer Experience
  • Best use of Data Science/AI for Health & Wellbeing
  • Best use of Data Science/AI for Industry 4.0
  • Best use of Data Science in SME/Start Up
  • Best use of Data to Achieve Social Impact
  • Best technical advance in the field of Data Science/AI from a research organisation either in academia or industry
  • Data Science Student of the Year

Important dates for this year’s Awards: 

  • March 2019 – 24th May 2019: Entry window open
  • June 2019: Judging & Finalists selected
  • July 2019 – Finalists announced
  • w/c 22nd July 2019: Finalists presentations
  • 5th September 2019: Awards Day & Winners announced

Business Analytics MSc Scholarship Available: 
A central initiative of the European DatSci & AI Awards is paying it forward to the next generation of Data Science Talent, each year proceeds of ticket sales fund a Scholarship for full-time fees for a Level 9 MSc Business Analytics Student at UCD Smurfit School, Dublin. Check out details here. 

Good luck!

10 Awards to compete in The European DatSci _ AI Awards 2019

Marketing as a strategic business partner: mixing theory, research and #data

Marketing is viewed as a key strategic participant in achieving the goals of businesses, both large and small. I thought I’d share how we started to apply marketing theory, practice, and insights from data. There are tons of ‘skate on the surface’ marketing soundbite books, which may sound good at the first glance with no depth. But it’s quite a different thing to work at it so you really know it, practice it, and can share it with other people so it is authentic. It’s about focusing on the real, not the soundbite. Real never goes out of fashion.

Research evidence has shown that consumers interact with advertising in complex ways, especially since we have such short attention spans (Weilbacher, 2003). How do organizations know which way their decisions should land? After all, the Internet can break a business very quickly! So, organizations need a cohesive strategy which aligns all methods of communication to ensure consistency (Porter, 2001).

The overall company vision can be conveyed through the brand, which is shorthand for the company vision. The brand to communicate and distinguish an organization from other organizations. Brands can be viewed as a collective of perceptions that exist in the mind of the consumer, and the process of distinguishing and identifying products is known as branding (Doyle, 1999; cited in Baker, 1999).  Even in the industrial sector, where decisions are made by technical team members, brands can acquire confidence through familiarity (Levitt and Levitt, 1986).

Building a brand involves four key concepts: quality, service, innovation and differentiation (Doyle, 1999). Brands can also migrate from mature technologies to encompass new technology (Doyle, 1989). For example, over at Data Relish, we mix mature Business Intelligence technologies with new and upcoming technology, Artificial Intelligence is enjoying rapid growth (Xiong, 2019).

How can you tell what’s working for your organization? Particularly in the current economic climate and Brexit-dominated climate, it is important to identify the most targeted opportunities to maximise growth opportunities. It’s also important to collect customer feedback and references, which can be incredibly enlightening and occasionally even heartwarming! So what’s the secret?

  • The secret is the data. For example, at Data Relish, we have conducted some analyses in order to look at targeting and positioning better. It’s like having an arrow, and the segmentation is equivalent to pulling the arrow back so it gives you more power to go further.
  • It’s great to be insight-driven, but it’s also good to be data-driven and evaluate what the data actually says. It’s a balance.
  • Also, have good tools.

To do this, we used Power BI to examine the data from source systems such as HubSpot, FreeAgent and Insightly to understand better what was working from the marketing perspective. We also send the results to an independent marketing consultant for the purposes of replication and verification.

Obviously I’m not sharing the findings here, but it is safe to say that the results were enlightening. The UK Government estimates that AI could add an additional USD $814 billion (£630bn) to the UK economy by 2035, increasing the annual growth rate of GVA from 2.5 to 3.9% (UK Government, 2017). By asserting experience as well as expertise, one thing that was working was that we were viewed as ‘trusted advisors’ in AI, helping to lead organizations at success by working with the C-suite to make the data work, and work hard to add business value. It became clear that we were perceived as a proven safe trusted advisor with experience in what’s perceived as a ‘young’ field, and this is a key differentiator which is proving invaluable in gaining clients in a variety of sectors. It seems that everyone is an expert in AI these days! But we could actually show it.

So, we made efforts to have real chops and apply our research and findings to our own organization, and to really analyze the data, and to apply marketing theory and practice to improving , informed by the academic literature. Data can help us to understand the research to explore and elicit reasons for avoidance of brands (Lee et al., 2009).

I’ll blog more on the journey with marketing with Power BI, HubSpot, FreeAgent and so on, but the main takeaway here is that you always learn something by having a fresh look at the data. Additionally, it’s important to get the results independently verified. If you need a hand with these concepts, don’t hesitate to get in touch over at Data Relish.

References

Doyle, P. 1999, ‘Branding’ in Baker, M (ed.) The Marketing Book.; 4th ed, Chartered Institute of Marketing., Oxford, UK.

Doyle, P. 1989, Building successful brands: The strategic options. Journal of Marketing Management, vol. 5, no. 1, pp. 77-95.

Levitt, T. & Levitt, I.M. 1986, Marketing Imagination. Simon and Schuster.

Porter, M.E. 2001, Strategy and the Internet. Harvard business Review, (2001): 63-78.

Porter, M.E., 2008. The five competitive forces that shape strategy. Harvard business Review, 86(1), pp.25-40.

Weilbacher, W.M. 2003, “How Advertising Affects Consumers”, Journal of Advertising Research, vol. 43, no. 2, pp. 230-234.

UK Government 2017,  Executive Summary. Available: https://www.gov.uk/government/publications/growing-the-artificial-intelligence-industry-in-the-uk/executive-summary

Xiong, X. 2019, “Analysis of the Status Quo of Artificial Intelligence and Its Countermeasures”, 2018 International Workshop on Education Reform and Social Sciences (ERSS 2018)Atlantis Press .

What I’m doing this week at #MSIgnite

I’m delighted to say that I’m doing the Community Reporter role for Microsoft Ignite. This means I get to interview the Microsoft Executive Team, such as Amir Netz, James Phillips and Joseph Sirosh. I have complete stars in my eyes! I don’t often get the chance to speak with them so I’m delighted to get to do that. Also, they are very interesting and they have a lot to say on topics I’m passionate about, so make sure and tune in for those. I’ll release more details about times and how you can watch as soon as I can.

What does a Community Reporter do? During Microsoft Ignite, the Community Reporters will be your go-to’s for live event updates. If you aren’t attending the conference this year, these reporters will be a great way to see what’s happening on-the-ground in Orlando. Check out my content on my blog here and on Twitter and LinkedIn follow me on social to stay up-to-date on all things Microsoft Ignite!

I’d also like to meet some of you so when I get the chance, I’ll tweet out to see if any introverted people fancy sitting at a table with me for breakfast or lunch to talk about all things data.

I am also speaking at Ignite so here are the details:

When? Thursday, September 27 4:30 PM – 5:15 PM
Where? Room W330 West 2 

Artificial intelligence is popularized in fictional films, but the reality is that AI is becoming a part of our daily lives, with virtual assistants like Cortana using the technology to empower productivity and make search easier. What does this mean for organizations that are running the Red Queen’s race not just to win, but to survive in a world where AI is becoming the present and future of technology? How can organizations evolve, adapt, and succeed using AI to stay at the forefront of the competition? What are the potential issues, complications, and benefits that AI could bring to us and our organizations? In this session, we discuss the relevance of AI to organizations, along with the path to success.

 

Microsoft Power BI, Microsoft R and SQL Server are being used to help tackle homelessness in London by providing actionable insights to improve the prevention of homelessness as well as the processes in place to help victims. Join this session to see how Microsoft technologies are helping a data science department to make a difference to the lives of families, by revealing insights into the contributors of homelessness in families in London and the surrounding area. Join this session to understand more about finding stories in data. The case study also demonstrates the practicalities of using Microsoft technologies to help some of the UK’s most vulnerable people using data science for social good.

When? Thursday, September 27 2:15 PM – 3:30 PM
Where? OCCC W222

For people who want to build careers and manage teams, it is crucial to understand diversity and how it impacts your organization. Increasing the role of women in technology has a direct impact on the women working in hi-tech, but the effects can go far beyond that. How do female tech workers influence innovation and product development? How do men benefit from having more women working in technology? Can the presence of women in tech affect a company’s profit? Join a lively discussion on diversity, and hear proactive steps that individuals and companies can take in order to make diversity and inclusion part of the organizational DNA.

One last thing!

Remember to download the Microsoft Ignite app to have your information handy on-the-go!

See you there!

 

 

Fun DataDive with DataKind UK

This weekend, I volunteered with DataKind UK on their Summer DataDive, which took place on the weekend of 28th and 29th July 2018 in the Pivotal London offices in Shoreditch. I had a fantastic, memorable weekend, mixing with around 200 other data scientists.

I’d like to thank the DataKind team for being so inspirational, giving, and kind with their time and skills. I’d like to emphasise my absolute admiration for the Data Ambassadors and the work that they do to lift everyone up.

Why did I do this? DataKind appealed to me since it meant that I could sharpen my data science skills  by pitching in with experts. New learners to Data Science are welcome, and there were also newbies who had some experience of data and wanted to know more. There was room for everyone to contribute, so if you are a newbie, it would be a great way to join in the conversations and learn from experts who love what they can achieve with data. Plus, it’s a great opportunity to mix with real data scientists. This isn’t Poundland data science, and this is not pseudo Data Science. This is the real thing; and I spent two days immersed in real problems using Data Science as a solution. I learned a lot, and I contributed as well. There is a saying that you are the average of your friends, and I needed to get close to more Data Scientists so that I could build on my earlier experience on AI and bring it up-to-date.

I wanted to help a charity, by dedicating my time and skills, to support women and girls who need it. I understand that there are vulnerable men too; but this isn’t about whataboutism. Women and girls are disproportionally affected by issues such as domestic violence and being the victims of sexual crimes, and I wanted to do something practical to help.

Lancashire Women's Centres LogoFor my specific contribution, I was working with a team of 25 other data scientists, we worked on finding insights in data belonging to Lancashire Women’s Centre. The vision of Lancashire Women’s Centre is that all women and girls in Lancashire are valued and treated as equals. Their aim is to empower women and girls to be able to transform their lives by bringing them together to find their voice, share experiences and understanding, develop their knowledge and skills, challenge stereotypes and misconceptions about them so that they can have choices in becoming the individuals they want to be. I share this conviction deeply and I wanted to help.

You may well be thinking that the charity help a small number of women, but that’s not the case at all. They have a real impact in their community. The Lancashire Women’s Centre has helped over 3000 women in the last year. This includes 5807 hours of therapeutic support were accessed by 1154 women and 78 men.  Following therapy: 25% were no longer taking medication, 8% felt the support had helped them find and keep a job, 12% continued to access LWC services to support their recovery.

So what did I do? I can’t share specific details because the data is confidential, and it obviously impacts some of the UK’s most vulnerable women and girls. I will say that the tools used were CoCalc, R, Python, Excel and Tableau and Power BI to work with the data.

DataKind™ brings high-impact organizations dedicated to solving the world’s biggest challenges together with leading data scientists to improve the quality of, access to and understanding of data in the social sector. This leads to better decision-making and greater social impact. Launched in 2011, DataKind leads a community of passionate data scientists, visionary partners and mission-driven organizations with the talent, commitment and energy to use data science in the service of humanity. DataKind is headquartered in New York City and has Chapters in Bangalore, Dublin, San Francisco, Singapore, the UK and Washington DC. More information on DataKind, our programs and our partners can be found on their website: www.datakind.org

Lancashire Women’s Centre

DataKind JenStirrup and Team

I’m the one on the right, wearing orange!

I’m looking forward to the next one!

Modelling your Data in Azure Data Lake

One of my project roles at the moment (I have a few!) is that I am architecting a major Azure implementation for a global brand. I’m also helping with the longer-term ‘vision’ of how that might shape up. I love this part of my job and I’m living my best life doing this piece; I love seeing a project take shape until the end users, whether they are business people or more strategic C-level, get the benefit of the data. At Data Relish, I make your data work for different roles organizations of every purse and every purpose, and I learn a lot from the variety of consulting pieces that I deliver.

If you’ve had even the slightest look at the Azure Portal, you will know that it has oodles of products that you can use in order to create an end-to-end solution. I selected Azure Data Lake for a number of reasons:

  • I have my eye on the Data Science ‘prize’ of doing advanced analytics later on, probably in Azure Databricks as well as Azure Data Lake. I want to make use of existing Apache Spark skills and Azure Data Lake is a neat solution that will facilitate this option.
  • I need a source that will cater for the shape of the data…. or the lack of it….
  • I need a location where the data can be accessed globally since it will be ingesting data from global locations.

In terms of tooling, there is always the Azure Data Lake tools for Visual Studio. You can watch a video on this topic here. But how do you get started with the design approach? So how do I go about the process of designing solutions for the Azure Data Lake? There are many different approaches and I have been implementing Kimball methodologies for years.

cellar

With this particular situation, I will be using the Data Vault methodology. I know that there are different schools of thought but I’ve learned from Dan Lindstedt in particular, who has been very generous in sharing his expertise; here is Dan’s website here. I have delivered this methodology elsewhere previously for an organization who have billions USD turnover, and they are still using the system that I put in place; it was particularly helpful approach for an acquisition scenario, for example.

 

Building a Data Vault starts with the modeling process, and this starts with a view of the existing datamodel of a transactional source system. The purpose of the data vault modelling lifecycle is to produce solutions to the business faster, at lower cost and with less risk, that also have a clear supported afterlife once I’ve moved onto another project for another customer.

 

Data Vault is a database modeling technique where the data is considered to belong to one of three entity types: hubs, links,and satellites:

 

  • Hubs contain the key attributes of business entities (such as geography, products, and customers)
  • Links define the relations between the hubs (for example, customer orders or product categories).

 

  • Satellites contain all other attributes related to hubs or links. Satellites include all attribute change history.

 

The result is an Entity Relationship Diagram (ERD), which consists of Hubs, Links and Satellites. Once I’d settled on this methodology, I needed to hunt around for something to use.

How do you go about designing and using an ERD tool for a Data Vault? I found a few options. For the enterprise, I found  WhereScape® Data Vault Express. That looked like a good option, but I had hoped to use something open-source so other people could adopt it across the team. It wasn’t clear how much it would cost, and, in general, if I have to ask then I can’t afford it! So far, I’ve settled on SQL Power Architect so that I can get the ‘visuals’ across to the customer and the other technical team, including my technical counterpart at the customer who picks up when I’m at a conference. This week I’m at Data and BI Summit in Dublin so my counterpart is picking up activities during the day, and we are touching base during our virtual stand-ups.

StockSnap_DotsSo, I’m still joining dots as I go along.

If you’re interested in getting started with Azure Data Lake, I hope that this gets you some pointers from the design process.

I’ll go into more detail in future blogs but I need to get off writing this blog and do some work!

Useful Data Sources for Demos, Learning and Examples

One question that pops up from time to time is the question over sample datasets for use in self-learning, creating training materials or just for playing with data. I love this question: I learn by actively trying things out too. I love the stories in the data, and this is a great way to find the stories that bring the data to life, and offer real impact.

narrative

Since I deliver real projects with customer impact, I can’t demonstrate any real customer data during any of my presentations since my projects are confidential, so I have three approaches:
  • I use sample data and I have a signed NDA
  • I ask the customer for their data, anonymised and have a signed NDA.
  • I use their live data and have a signed NDA
If the customer elects the first option, then I use sample data from below.
To help you get started, I’ve cobbled together some pointers here, and I hope it’s useful. Please feel free to leave more ideas in the comments.

Entrepreneur

The latest edition of Entrepreneur has an insightful article on open source (frameworks vs libraries) and it has some good pointers to datasets at the bottom of the page. https://www.entrepreneur.com/article/310965 I’ve also pasted them here for you:
Bernard Marr has an updated list of datasets here, on Forbes. I’m not going to steal Marr’s list so I recommend that you go and head over to his page, where you’ll find sixty-plus options.

Data Source Direct Connectivity to R, Python, Ruby and Stata

R has a number of APIs that connect to public datasets e.g. the World Data Bank, which allows connectivity from R, Python, Ruby and Stata.  I used this for my recent demos at the Power BI event in the Netherlands, and it worked sweetly. SO you’d write your script to call the package, embed it in Power BI and it will go and get the data for you. I then create the chart in R, and put it into the Power BI workbook.

Quandl

Quandl offers financial data, and it has a hook so that R can connect directly to it as well.

Kaggle

Kaggle is owned by Google, presumably so that Google can promote Tensorflow. Since people share code based on Kaggle datasets, it’s very easy to pick code, copy it, change it, and see how it works. However, this can be an issue, since you can’t be sure that the code is correct.

Final Note

If you’re teaching or presenting using this data and/or sample code, you can be pretty sure that your training delegates have got access to the Internet too so you need to be sure that you credit people properly.
I am not mostly doing training, although I do training now and again. I am a consultant first and foremost. I’m meta-tracking my time with Desktime and with Trello since I am measuring exactly how I spend my time, and training does not account for a high percentage; project delivery takes the majority of my time.
Although I’m a guest lecturer for the MBA program at the University of Hertfordshire, and I’m going to be a guest lecturer on their MSc Business Analysis and Consultancy course, I do not consider myself a trainer. I am a consultant who sometimes does training as part of a larger project. I haven’t gone down the MCT route because I regard training as part of a bigger consultancy route. I never stop learning, and I don’t expect anyone else to stop learning, either.

literature