Dynamic Data Masking in Azure SQL Datawarehouse

I’m leading a project which is using Azure SQL Datawarehouse, and I’m pretty excited to be involved.  I love watching the data take shape, and, for the customer requirements, Azure SQL Datawarehouse is perfect.

secret-3037639_640 Note that my customer details are confidential and that’s why I never give details away such as the customer name and so on. I gain – and retain – my customers based on trust, and, by giving me their data, they are entrusting me with detailed information about their business.

One question they raised was in respect to dynamic data masking, which is present in Azure SQL Database. How does it manifest itself in Azure SQL Datawarehouse? What are the options regarding the management of personally identifiable information?

sasint

As we move ever closer to the implementation of GDPR, more and more people will be asking these questions. With that in mind, I did some research and found there are a number of options, which are listed here. Thank you to the Microsoft people who helped me to come up with some options.

1. Create an Azure SQL Database spoke as part of a hub and spoke architecture.

The Azure SQL Database spoke can create external tables over Azure SQL Datawarehouse tables for moving data into Azure SQL Database to move data into the spoke. One note of warning: It isn’t possible to use DDM over an external table, so the data would have to move into Azure SQL Database.
2. Embed masking logic in views and restrict access.

This is achievable but it is a manual process.
3. Mask the data through the ETL processes creating a second, masked, column.

This depends on the need to query the data. Here, you may need to limit access through stored procs.
On balance, the simplest method overall is to use views to restrict access to certain columns. That said, I an holding a workshop with the customer in the near future in order to see their preferred options. However, I thought that this might help someone else, in the meantime. I hope that you find something that will help you to manage your particular scenario.

How do you know if your org is ready for Data Science? Starting your journey with Azure Databricks

Mentioning  data science at your company may give you an air of expertise, but actually implementing enterprise-wide transformation with data science, artificial intelligence or deep learning is a business-wide transformation activity. It impacts your data and analytics infrastructure, engineering and business interactions, and even your organizational culture. In this post, we will look at a few high-level things to watch out for before you get started, along with a suggestion that you can try Azure Databricks as a great starting point for  your cloud and data science journey.

Note: not all companies are ready for data science. Many of them are still struggling with Excel. This article is meant for you.

So how can you move forward?

1. Have a data amnesty

If you’re still struggling with Excel, then data science and AI can seem pretty far off. Have a data amnesty – ask everyone to identify their key data sources so you can back them up, protect them, and share them better where appropriate.

2. Determine the Data (Im)maturity in your organization.

Take a peep at the following table: where is your organization located?

Democratization of Data

Note: this idea was inspired by Bernard Liautaud

Ideally, you’re headed towards a data democracy, where IT are happy in their guardianship role, and the users have got their data. If this equilibrium isn’t in place, them this could potentially derail your budding data science project. Working on these issues can help your success to be sustainable in the longer-term.

3. All that glitters isn’t data gold

This is the ‘shiny gadget’ syndrome. Don’t get distracted by the shiny stuff. You need your data vegetables before you can have your data candy.

Focus on the business problem you’re trying to solve, not the technology. You will need to think about the success criteria.

You should be using the technology to improve a business process, with clear goals and measurable success. Otherwise it can be disorganized, with a veil of organization that is disguised by the technology.

contacts-3079618_1920

4. Fail to plan, plan to fail

If you fail… that’s ok. You learned ten things you didn’t know before. Next time, plan better, scope better, do better.

How to get started?

Starting in the cloud is a great way to get started. It means that you’re not purchasing a lot of technology and hardware that you don’t need. Abraham Maslow was once quoted as saying “If you only have a hammer, you tend to see every problem as a nail.” Those words are truer than ever as an increasingly complex and interconnected world makes selecting the right tools for the data estate. With that in mind, the remainder of this blog talks about Azure Databricks as a step for data science for the new organization in order to reduce risk, initial outlay and costs.

 

 

What is Microsoft Azure Databricks?

default-open-graphAzure Databricks was designed in collaboration with Microsoft and the creators of Apache Spark. It is designed for easy data science: one-click set up, streamlined workflows and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Each of this roles will have a different style describes how users want to interact with, present, and share information, bearing in mind the varying skillsets of both business users and IT.

So what is Apache Spark? According to Databricks, Apache Spark is the largest open source process in data processing. From the enterprise perspective, Apache Spark has seen rapid adoption by enterprises across a wide range of industries.

So what does Apache Spark give you? Apache Spark is a fast, in-memory data processing engine. For the serious data science organisation, it allows developers to use expressive development APIs to work with data. For information and data workers, they have the ability to execute streaming analytics, longer-term machine learning or SQL workloads – fast. Implemented in Azure, it means that the business users can use Power BI in order to understand their data better.

Apache Spark consists of Spark Core and a set of libraries. The core is the distributed execution engine and the Java, Scala, and Python APIs offer a platform for distributed ETL application development. Spark lets you quickly write applications in Java, Scala, or Python. It comes with a built-in set of over 80 high-level operators. And you can use it interactively to query data within the shell.

The Apache Spark functionality is incorporated in Azure Databricks

In addition to Map and Reduce operations, it supports SQL queries, streaming data, machine learning and graph data processing. Developers can use these capabilities stand-alone or combine them to run in a single data pipeline use case.

It supports in-memory processing to boost the performance of big-data analytic applications, and it works with other Azure data stores such as Azure SQL Data Warehouse, Azure Cosmos DB, Azure Data Lake Store, Azure Blob storage, and Azure Event Hub.

What is so special about Apache Spark, anyway?

For the enterprise and data architects, it can give you the opportunity to have everything in one place: streaming, ML libraries, sophisticated analytics, data visualization. It means that you can streamline in one technological umbrella, but have your data in other data sources such as Azure SQL Data Warehouse, Azure Cosmos DB, Azure Data Lake Store, Azure Blob storage, and Azure Event Hub.

As an architect, I aim to reduce points of failure and points of complexity, so it is the neatness of the final streamlined technology solution that is appealing.

It is also fast, and people want their data fast. Spark enables applications in Hadoop clusters to run up to 100x faster in memory, and 10x faster even when running on disk. Spark makes it possible by reducing number of read/write to disc. It stores this intermediate processing data in-memory. It uses the concept of Resilient Distributed Dataset (RDD), which allows it to transparently store data on memory and persist it to disc only it’s needed. This helps to reduce most of the disc read and write the main time consuming factors of data processing.

 

Data Visualization for the Business User with Azure Databricks as a basis

Azure Databricks brings multi-editable documents for data engineering and data science in real-time. It also enables dashboards with Power BI for accurate, efficient and accessible data visualization across the business.

Azure Databricks is backed by Azure Database and other technologies that enable highly concurrent access, fast performance and geo-replication, along with Azure security mechanisms.

Summary

Implementing enterprise-wide transformation with data science, artificial intelligence or deep learning is a business-wide transformation activity. In this post, there is the suggestion that you can try Azure Databricks as a great starting point for  your cloud and data science journey, with some advice on getting a good ground before you start.

Book Review: Grokking Algorithms: An Illustrated Guide For Programmers and Other Curious People

Grokking Algorithms An Illustrated Guide For Programmers and Other Curious PeopleGrokking Algorithms An Illustrated Guide For Programmers and Other Curious People by Aditya Y. Bhargava

My rating: 5 of 5 stars

I’ve just finished reading the Manning book called Grokking Algorithms An Illustrated Guide For Programmers and Other Curious People

This is a very readable book, with great diagrams and a very visual style. I recommend this book for anyone who wants to understand more about algorithms.
This is an excellent book for the budding data scientist who wants to get past the bittiness of learning pieces of open source or proprietary software here and there, and wants to learn what the algorithms actually mean in practice. It’s fairly easy to get away with looking like a real Data Scientist if you know bits of R or Python, I think, but when someone scratches the surface of that vision, it can become very apparent that the whole theory and deeper understanding can be missing. This book will help people to bridge the gap from learning bits here and there, to learning what the algorithms actually mean in practice.
Recommended. I’m expecting to find that people might ‘pinch’ the diagrams but I’d strongly suggest that they contact the author and credit appropriately.
I’d recommend this book, for sure. Enjoy!

View all my reviews

Open Source Decency Charter Proposal for Dealing with Harassment at Technical Events

rawpixel_alphabets-2518268_1920

If you’re reading this, you are probably a decent person. You shouldn’t read this thinking that you will be putting yourself in danger if you attend a tech event. I can tell you that I normally feel pretty safe at these events and you can read my story here and I’ve talked about it publicly since I want to do something good with it. Note that I don’t represent any other organization or body or person with this blog. It’s another heartdump.

Most people are pretty decent but what do you do about the ones that are not? How do victims know what to do? How do you know how to help one of your friends?

The vast majority of people want to help and are decent, and that’s why I’d like to propose the creation of an open source Decency Charter to help at technical community events which need support for handling harassment at events.

A Decency Charter would outline reasonable and decent expectations for participants within the a technical community event, both online and in-person, as well as steps to reporting unacceptable behavior and concerns. It’s fairly simple at heart: be decent to one another.

I think that it would be good to have to have something very clear in place that people can use as a template, so everyone can have a voice and feel safe. That’s why I think an open source Decency Charter is a good suggestion and I’d be interested in your thoughts.

This blog post is an attempt to bring a few strands together; namely diversity, harassment in the technical community, and a proposal for a way forward.

It’s a shame that we have to encode decency into technical events.  More and more workplaces are being embroiled in sexual harassment cases. According to the Trades Union Congress (TUC) in 2017, over 50% of workplaces have had an issue with sexual harassment. I think it would be good if people could adopt a Decency Charter, since it sounds more positive than a Code of Conduct. The inspiration came from Reid Hoffman, who talked about a Decency Pledge in his article The Human Rights of Women Entrepreneurs where he talks about sexual harassment of women in the industry. I’m grateful to Reid Hoffman for his article because it does help to have male voices in these discussions. Simply put, his voice will carry further than mine, and with way more credibility.

Followers of my blog will know that I’m trying to get support for a Diversity Charter to support diversity at events. As an additional add-on, I’d like to propose a Decency Charter as well, which gives people a template that they can use and amend to monitor their event, as they see fit. I’d love your ideas and please do email me at jen.stirrup@datarelish.com with your thoughts, or leave a comment on this blog.

I am going to start to list a few things here from the viewpoint of someone whose head is bloodied, but unbowed and I want to use my voice. Everyone’s experience is different but I thought that this might help in shaping a Decency Charter that sits alongside a Diversity Charter. So, what do I actually want?

telephoneaged-2974648_1920

As a starter for ten:

I want to feel safe and comfortable – Make it easy. I don’t have to have to think about it too hard if something happens to me or one of my friends – I need something that is so easy that I don’t have to look far to know what to do. I need to know what to do when something happens. I want to have a ‘home’ to go to, if something happens – that can be a location, or a person to call. I want to talk to someone. I want a number to call that is very visible on my event pass or pack so I can find it easily. I don’t want to google around for a form to fill in because that introduces a delay when it goes to an organizer, plus I am worried about putting my concerns about an individual or an event down in writing in case it gets in the wrong hands. This won’t secure my safety after the event, and that worries me, too. If I make a complaint, I can’t be sure that it would be successfully resolved and all relevant data removed, or handled confidentially. Google forms are so easily digested and forwarded by email and, like feathers, it could spread. I just want to talk to someone, in my own time. So, before, during and after the event, I’d ideally like each event to have a named panel of people who will listen to my concerns and they can act upon them in a clearly documented way.

I want others to feel safe and comfortable – I expect people to be able to answer accusations made about them. I don’t want people to think that the Microsoft Data Platform community, for example, is some den where there is a lot of harassment. There isn’t, but I’d like to see a Decency Charter in place in case there is.

I want to have a voice – I don’t want my voice taken away from me. I don’t want other people to speak for me. It’s easy for people to propose things without asking victims what they want, it’s very easy to dictate an approach from a point of privilege.

I want other people to have a voice – because everyone should be allowed to speak for themselves.

I expect confidentiality. I don’t expect people to repeat private details or rumours. At best, it immediately breeds distrust and you will never earn it back. At worst, you can deeply impact someone’s life by handling issues insensitively, and this cuts both ways. An accusation can’t be a condemnation, and there also has to be a balance with protecting people at the same time. Gossip doesn’t make me trust your processes in resolving things, and it has to be well thought out from all angles.  People can see how people behave with one another, and it’s a halo effect.

I expect you not to judge.

I expect to be able to get help right now, and have event organizers and volunteers who can support me if I need it. This is simply making sure that event volunteers are trained in knowing who to alert when something happens and responding thoughtfully and without judging, and, ultimately, centred on sensitivity.

I expect to be able to get help after the event, and have event organizers and volunteers who can support me if I need it.  I think that having an easily-available contact in place, well after the event, would be a good step. Event organizers usually have to clear things up well after an event, so this isn’t an onerous issue at all.

So how could this shape up?

I’d like to propose that, along with the Diversity Charter, we roll out an accompanying Decency Charter, similar to OpenCon Community Values or  the PASS Anti-Harassment policy. The PASS one is a good model but it only affects PASS events, and I’d like it to be an ‘open source’ way forward for community models. I think that, if we offered a ‘package’ of a Diversity Charter plus accompanying Decency Pledge, then the community have a template of ‘add-ons’ that they can choose to flex and use for their own events. They are absolutely welcome to change and adapt as they feel fit. I think it would be great to get a version 1.1. out there for the community to review and we can see what changes I get back.

What problem does this solve?

People don’t know where to start so we can give them a hand up.

As part of the speaker selection process, speakers can submit their past speaking experience as part of the speaker selection process. Organisers can choose to follow up with those past events to see if there are any issues with speakers; in any case, they should be doing their due diligence on speaker selection anyway, so it should not cost much effort  just to ask if there were any other issues that they should know about. It’s hard to deal with attendees because they are harder to police, and they can provide anonymous details at the point of registration. However, sending a signal with a robust Decency Pledge would send a message before people turn up to the event, and they should agree to adhere to it as part of the event registration process.

It’s so much easier to talk facts to someone, which is why I think organizers can offer contact details in case anyone wants to get in touch with them after the event.

Here are some resources to follow up:

PASS Summit Anti Harassment Policy

Enforcing a Code of Conduct

Responding to Reports of Harassment at Tech Events

I also want to add these resources in case this blog triggers anyone:

Male Rape and Sexual Abuse – because men can be victims, too.

Supporting a Survivor. 

I wanted to put this poem here, which is Invictus by William E. Henley:

Out of the night that covers me,
Black as the pit from pole to pole,
I thank whatever gods may be
For my unconquerable soul.

In the fell clutch of circumstance
I have not winced nor cried aloud.
Under the bludgeonings of chance
My head is bloody, but unbowed.

Beyond this place of wrath and tears
Looms but the Horror of the shade,
And yet the menace of the years
Finds, and shall find me, unafraid.

It matters not how strait the gate,
How charged with punishments the scroll,
I am the master of my fate:
I am the captain of my soul.

You’ve got this.

I’d love to know what you think. Please contact me at jen.stirrup@datarelish.com and I’ll be pleased to know your thoughts.

metoo-2859980_1920

Roundup of 2017 Presentations, and what’s next for 2018?

I’ve listed out some of my key speaking engagements for 2017. I am sure that I’ve done more events but this is a good start – often I am so busy that I drop things very quickly after I’ve ticked the box and done it. I’ve noted that I’m speaking to larger audiences over the years, but I’m doing less speaking events overall because I just simply can’t do all of the events that I’d like to do, so I’ve had to focus better.

I’ve also diversified the locations of my presentations. I was delighted to go to Dubai to present, and I will be doing more over in the Gulf this year. I’m doing a session just before Christmas, and I’ll release more details about that event shortly. I’m also lining up more events in the Gulf next year, because I had such a great experience speaking in Dubai recently. I spoke at a private event in Singapore earlier this year and I’ve been invited back for 2018, and hopefully I can do one of the data or SQL Server meetups. I’ve also been invited back to Jersey for a more indepth session, and I’ll be glad to do that, too.

For 2018, I’m hoping to do more large events and to do more online sessions as well. My recent Python webinar was very well received and I like the longevity of having sessions up on YouTube.

2017

#TSQL2SDAY 96 – Folks Who Have Made a Difference

This post is part of TSQL Tuesday, a monthly blog party. This month’s topic is “Folks Who Have Made a Difference”, by Ewald CressHere is my first TSQL Tuesday in a long time – way too long, actually.

t-sql-tuesday-logo

Diversity is so important to me; tech community is a place for people to grow and there are plenty of good hearts in it. I want everyone to feel welcome, really.

The reason that I got into community was that I attended my first ever community event, SQLBits in Birmingham. I was nervous and ready to leave when this super friendly guy came up to me, explained he worked for Microsoft and he chatted away and basically he made me stay for the next session.

He sat with me through the lightning talks session and he made me feel welcome. He’d gathered up a few ‘strays’ like me and I had a really nice day; and it helped me to go back.

thanks-1804597_1920That gentleman was Andrew Fryer and I have never forgotten his kindness and he’s inspired me ever since. Andrew probably doesn’t even remember, and it may have been a little thing to him. But for me, it was huge and it helped change my life. If we can all do ‘small’ things like this, they add up, right?

If it wasn’t for Andrew saying hello that day, I would have left and never gone back to any events, probably. So I want to say Thank You to Andrew for reaching out that day, and for all of his support ever since.

 

 

HR and Digital Transformation session in Dubai

Here are the slides for my session in Dubai, on HR and Digital Transformation. I talked a little about Excel, Power BI, SQL Server and Oracle, as well as some of the new ways in which Artificial Intelligence could help HR.

JenDubai

I decided I would try and show courtesy to my hosts by wearing a headscarf. I appreciated their invite deeply and I am glad that I wore it.

My HR Summit and Expo 2017 session was held on 8th November at the Za’abeel Halls 5 – 6, Dubai International Exhibition and Convention Centre, Dubai. It was a truly amazing event with great content and fantastic networking. The audience were my ideal audience – really keen to learn how they could use data and technology for the purposes of the business. I would love to go back next year. I was fortunate to attend some of the other sessions and each one was a real gem. I took a lot of notes for my Executive MBA class so it was very useful from that perspective, too.

I tend to talk ‘around’ slides rather than read off them. How is your current Digital Transformation strategy performing? Are you part of an organisation which is struggling to know how to deal with your data? Do you think that you might have a Big Data problem, but you’re not really sure where to start? How do you get the experience, skills or techniques to get it right, faster and within budget? Digital Transformation is a hot topic with CEOs and the C-level suite, renewing their interest in data and what it can do to empower the organisation. As part of the Digital Transformation story, data can help to bring clarity and predictability to the HR leader to make strategic decisions, understand how their customers and employees behave, and measure what really matters to the HR team and the organization overall. Join this session to learn about effective principles and practices for Digital Transformation for the HR Leader. You will obtain advice and suggestions to help you to tackle these issues with Digital Transformation, Big Data and other issues to drive your organisation’s short and long term future, using data and Power BI.

Here are the slides:

Enjoy!