Microsoft Ignite interview with Kevin Farlee on Azure SQL Database Hyperscale

Azure SQL Database is introducing two new features to cost-effectively migrate workloads to the cloud. SQL Database Hyperscale for single databases, available in preview, is a highly scalable service tier that adapts on demand to workload needs. It auto-scales up to 100 TB per database to significantly expand potential for app growth.
What does this mean? It’s one of the most fundamental changes to SQL Server storage since 7pm. So this is big: big news, and very big data stores. I am very lucky because I got to interviewe Kevin Farlee of the SQL Server team about the latest news, and you can find the video below.

I am sorry about the sound quality and I have blogged so that the message is clear. When I find the Ignite sessions published, I will add in a link as well.
What problem are the SQL Server team solving, with Hyperscale? The fundamental problem is how do you deal with very large databases in the cloud. VLDBs is the problems that people want to do with normal operations. All the problems with VLDBs occur due to the sheer size of data, such as backups, restores, maintenance operations, scaling. Sometimes these can take days to conduct these activities, and the business will not wait for these downtimes.  If you are talking tens of terabytes, that takes day and ultimately Microsoft needed a new way to protect data and VLDBs. The SQL Team did something really smart and rethought very creatively on how they do storage, in order to take care of the issues with VLDBs in the cloud.
So, the Azure SQL Server team did something that is completely in line with one of the main benefits and key features of cloud architecture: they split out the storage engine from the relational engine. Storage implementation was completely rethought and remastered from the ground up. They took the viewpoint over how you would go about architecting, designing and building for these solutions in the cloud, if you were to start from scratch?
The Azure SQL Server database team did a smart thing: Azure SQL Server is using microservices to handle VLDBs.
The compute engine is one microservice which is taking care of it’s role, and then another microservice that is taking care of the logging, and then a series of microservices that handle data. These are called page servers, and they interface at the page level. The page servers host and maintain the data files. Each page server handles about a terabyte of data pages. You can add on as many as you need.
Ultimately, compute and storage are decoupled so you can scale compute without moving the data. This means it’s possible to keep adding more and more data, and it also means that you don’t have to deal with the movement of data. Moving data around when there are terabytes and terabytes of data isn’t a trivial task. The page servers have about a terabyte of data each, and the page servers have about a terabyte’s worth of SSD cache.
The ultimate storage is Azure Blob Storage, because blob storage is multiply redundant and it has features like snapshots, so this means that they can do simultaneous backups by just doing a snapshot across all of the blobs. This has no impact on workload.
Restores
Restores are just instantiating a new set of writeable disks from a set of snapshots, and works with the the page servers and the compute engine to take care of it, working in symphony. Since you’re not moving the data, it is faster.
I’m personally very impressed with the work that the team they’ve done, and I’d like to thank Kevin Farlee for his time. Kevin explains things exceptionally well.
It’s worth watching the video to understand it. As well as the video here, Kevin goes into detail in his Microsoft Ignite sessions, and I will publish more links when I have them.
Community
One advantage in doing the MIcrosoft Community Reporter role is that I get to learn from the experts, and I enjoyed learning from Kevin throughout the video.
It seems to me that the Azure SQL database team have really heard the voice of their technical audience and they’ve worked passionately and hard to tackle these real life issues. I don’t know if it is always very clear that Microsoft is listening but I wanted to blog about it, since I can see how much the teams take on board the technical ‘voice’ from the people who care about their solutions, and who care enough to share their opinions and thoughts so that Microsoft can improve their solutions.
From the Azure architecture perspective, it works perfectly with the cloud computing concept of decoupling the compute and the storage. I love watching the data story unfold for Azure and I’m excited by this news.

Eating the elephant one bite at a time: knowing your tables

In this segment, we will look at the most basic unit of the database: tables. We will look at their implementation in SQL Server and also in Hive.

Tables, in SQL Server, contain the data in a database.  SQL Server has a number of different types of database as well as the tables that are defined by the user. These table types include partitioned tables, temporary tables, system tables and wide tables. If you need more information on these types, please visit the official Microsoft documentation here.

In Hive, there are two types of tables: internal and external tables. In our last topic, we touched on tables and how we need to drop them in Hive before we can delete the database (unless you use the CASCADE command). 
Cindy Gross has written an excellent blog topic on internal vs external tables, and you can access it here

For SQL Server people, it is important not to think that the internal/external point is similar to the local / global temporary tables discussion. Local temporary tables are visible only to the current session, whereas global temporary tables are visible to all sessions. Temporary tables in SQL Server cannot be partitioned. 

It is important to note that Hive tables are not akin to SQL Server system tables, where SQL Server stores its internal knowledge about the configuration of the server. In System tables, users cannot query or update these tables directly, and view the information via system views

Instead, the Internal/External point refers to the level at which Hive manages the table, and whether Hive sees the data as being shared across different tools, or simply used by Hive only. Internal and external tables in Hive can be partitioned, and we will look at this point further in a later post. Unlike system tables, internal tables are used to store data which isn’t about the system itself, as in a SQL Server system table.

For our purposes here of eating the Hadoop elephant, one bite at a time, we will look at the differences between these table types. Here is a summary below:

Managed / Internal Tables – Hive owns the data, dropping the table deletes the table metadata and the actual data.
External tables – points at the data but does not own it.  Dropping the table deletes the metadata, not the actual data. 

The data for this table is taken from the UNICEF Report Card, which details childrens’ wellbeing in different countries. Here, the data looks at childrens’ wellbeing in rich countries, and you can read the report here.


Creating an External Table

In Hue, you need to be sure that you are pointing at the correct database. 
On the left hand side, you will see a drop down list of databases. 
We are going to choose the database ‘IncomeInequality’

The external table can be created using the following statement, which is terminated by a semi-colon “;”.

CREATE EXTERNAL TABLE IF NOT EXISTS UNICEFReportCard ( CountryName STRING, AvgRankPosition FLOAT, MaterialWellBeing FLOAT, HealthAndSafety FLOAT, EducationalWellBeing FLOAT, FamilyPeersRelationships FLOAT, BehavioursAndRisks FLOAT, SubjectiveWellBeing FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LOCATION ‘/user/hue/UNICEF’; 


This query can be copied and pasted into the Query button, and then you can execute the query 
simply by pressing the Execute button.
In order to check that the table has been created, you can also do a simple SELECT statement in 
Beeswax Query Editor in order to see if the data exists:
SELECT * FROM IncomeInequality.UNICEFReportCard;

This will return no data, but if it does not error, then we know that the table has been found.

Alternatively, you can click on the Beeswax (Hive UI) and then select the ‘Tables’ button.

Make sure you’ve selected the correct database, IncomeInequality, and then you will see the table on the right hand side.

When you click on the table, you can see two options: you can see the columns, or view a sample of the data. 
So, how do we get data into the table? 
In SQL Server, we’d use SQL Server Integration Services.
In Hive, using the Beeswax GUI, we load up some data files. The external table is essentially a structure over the data. 
So, if we drop the table, we don’t delete the data; it still resides in the data files. 
There are a number of ways to load files:
 The simplest is through the GUI. Simply select ‘Upload’ and insert your file. 
If you have more than one file, you could select a zip archive and your files will be loaded.
One caveat here: When I was using Internet Explorer version 10.9, 
the ‘Upload’ button didn’t seem to work properly. I switched to FireFox, and it worked fine.
Once you’ve uploaded the file, you can see it in Beeswax.

8 Beeswax File is uploaded

Now, if you go back and view the table, you can see it contains the data from the file. Here is an example below:

9 UNICEF Report Card data is loaded

You could also execute the following command from the Hive Query Editor in Beeswax:
LOAD DATA INPATH ‘/user/hue/UNICEF/UNICEFreportcard.csv’ OVERWRITE INTO TABLE `IncomeInequality.unicefreportcard`
Using the interface or queries, how can you distinguish between an internal table and an external table? 
How can you tell if a table is managed or external? Hive has a command which is like sp_help in SQL Server is used to give you details about a table. 
The Hive command is given here:

DESCRIBE EXTENDED and then the name of the table, for example:

DESCRIBE EXTENDED UNICEFReportCard;
We can see the output below. If you click on the image, it will pop up the original and you can read it better:

10 Extended Table Results

However, this command isn’t very readable, so you might want to try the following:
DESCRIBE FORMATTED default.sample_07;
This produces a more pretty format, which is easier to read.
 and you can see that this sample table, in the default database, is a managed or internal table.

11 DESCRIBE FORMATTED TableName

The data files for an internal table are stored in the subdirectory ‘warehouse’, 
which is where data for internal tables is stored. Hive looks after the data in these tables. 
To summarise, we have looked at the different table types and how to create them, and how to upload data into them. 
Thank you for joining me in this series so far. Our next step is to look at partitioning tables, and then we will start to look at analysing data.
Any questions in the meantime, please email me at jen.stirrup@copper-blue.com
Kind Regards,
jen


Mobile Business Intelligence Presentation Portugal 2013

I gave this presentation at SQLSaturday event in Portugal during March 2013.  Thanks to the organisers for taking such good care of me during my visit.

As I discussed during the session, there were three main strands of mobilising Microsoft technology:

– use third party products such as PivotStream to free your PowerPivots
– use cloud computing e.g. Azure
– use SharePoint

These options will not suit everyone but it might help somebody to decide which path is the right one for them. I look forward to your commentary and feedback.

SQLSaturday Edinburgh Preconference Training Day sponsored by SQLSkills

The SQLSaturday Edinburgh team are extremely proud to present the following one day, advanced level training with SQLskills Jonathan Kehayias and Joseph Sack. The Training Day will be held on Friday 7th June at the University of Edinburgh Pollok Hall, Edinburgh, Scotland. You can find full details here, such as travel, accommodation and venue., and you can register here.
 
I’m very proud that Jon and Joe are joining us. They are extremely well-respected for their knowledge in SQL Server.
 
You may well have read their books or material. Jonathan is the ‘go-to’ expert and authority on Extended Events in SQL Server 2008— for example, he wrote the definitive whitepaper on Extended Events for Microsoft.
 
Joe is a SQL Server author, and you can see his Amazon page here. In 2009, Joe took over responsibility for the entire SQL Server Microsoft Certified Master program and held that post until 2011.
 
Jonathan and Joe are part of the highly-respected SQLskills team. We are also delighted to announce that SQLskills are our official Preconference Training Day Sponsor, and we are grateful to them for their fantastic support of SQLSaturday Edinburgh.
 
The SQLskills team are brought together by Paul S. Randall and Kimberly L Tripp, two of the world’s most renowned SQL Server experts, is dedicated to providing the highest quality and most in-depth training and consulting in the market. The SQLskills team have an emphasis on real-world expertise that you can immediately put into practice, making you a better SQL Server professional.
 
The Training Day itself has the following Agenda and you can register here.
 
Leveraging SQL Server 2012 Features and Tools
SQL Server 2012 offers a number of new features that improve high availability, security, performance analysis, and scalability that can be immediately leveraged for an immediate return on investment associated with upgrading. In this session, Joe and Jonathan will go through a huge list of problems solved by these new features and tools – and will demo them along the way!
Some of the new SQL Server 2012 features that will be discussed:
  • partitioning enhancements
  • ColumnStore indexes
  • contained databases
  • distributed replay
  • server-level roles
  • predictable failover with failover policies
  • availability groups
  • multi-database failover
  • read-only secondaries
  • and… much, much more!
If you want an information packed session with practical reasons to upgrade and powerful tips and tricks for using the new release – this workshop is for you!
Who are the presenters?
Jonathan Kehayias and Joseph Sack are well-known SQL Server experts, part of the elite SQLskills team of expert consultants.
What will you take away from the session?
By attending this session, you will obtain advanced SQL Server 2012 knowledge that will help you to conduct your upgrade more smoothly, whilst helping you to understand the new SQL Server 2012 features can help your environment.
SQLSaturday Edinburgh offer very reasonably priced sessions aimed at helping the SQL Server community. Given the low cost of £125 (exclusive of VAT) for this one-day training course, we believe that you will recoup this investment by delivering your upgrade with the expertise that you will gain from this session.

Microsoft SQL Server 2012 Latest Announcements: Hekaton, Service Pack 1 and PDW

There are three main announcements for Microsoft SQL Server 2012 announced today at the SQLPass Summit 2012 in Seattle, Washington. These are being announced by Ted Kummert, Corporate Vice President, Business Platform Division, Microsoft. You can also see Doug Leland’s announcement from the Business Platform Group here.
The announcements focus on three areas:
Ted Kummert introduces Project codenamed Hekaton a new in-memory technology for transaction processing that is an integral part of the data platform. Hekaton will ship in the next major version of SQL Server, and is wholly a part of it. It’s fast because the

The next version of the Parallel Data Warehouse is announced. 


SQL Server 2012 Service Pack 1 will be released. This will have Business Intelligence features and features that are not contained within the Business Intelligence sphere. I will cover these items later on in this blog.


What makes these announcements interesting?

The Hekaton announcement is interesting since it shows that Microsoft are real contenders in the in-memory area. SAP have been doing some interesting things, but now it’s Microsoft’s turn to play.  Hekaton will be part of an end-to-end SQL Server solution with in-memory capabilities from streaming and caching workloads right through to analytics. This offers a complete portfolio for transactional processing – full circle.

Is it fast? You betcha! Based on private customer testing to date, Hekaton will offer performance gains of up to 10 times for existing apps and up to 50 times for new applications optimized for in-memory performance. This is significant, and watch this space for more details. 

Here are the main points:

  • High performance, in memory OLTP engine.
  • Full ACID support.
  • T-SQL compiled to native code.
  • Enables massive improvement in performance. This will allow you to onramp existing apps to give you integrated performance for data loading, updating. And reading throughput is increased comparing best cases, due to the lack of latching and locking contention on the table.


The Parallel Data Warehouse announcement is very interesting. It will be available in the first half of 2013. 

It will be powered by PolyBase, which allows you to process queries across using relational and non-relational Hadoop data – right from SQL Server Management Studio. I don’t know about you, but I think that’s pretty awesome. The analytics that you’ll be able to do, right from good old familiar SSMS, will lead to breakthrough insights. You already know SSMS, and the query engine will translate the queries for you so you can focus on generating your data out for the insights. This will empower you to deal with big data and really use it, from an interface that you already know. It’s mixing HTFS nodes with SQL Server, which is incredibly powerful. It means that the PDW is a federated data processing engine, and very powerful.

The PWD is Built for big data. We have Next generation performance at scale. It uses the XVelocity column store. For big data, we start small with TBs and scale out to petabytes. This means that we can see a move away from SAN storage to reduce price. Hardware and software combined reduces the price. Since the hardware and the software are already aligned, this will help customers to reduce costs, since the hardware footprint will be reduced. 

Now for my favourite announcement – SQL Server 2012 Service Pack 1 will be announced, with a focus on Business Intelligence enhancements. It has new capabilities for Business Intelligence in Office 365. 



3. SQL Server Service Pack 1 Enhancements

This was announced back in June 2012. For Business Intelligence, there will be Power view enhancements. – Geographical mapping. Drill up and drill down. Hierarchies. Backgrounds. Oh, and pie charts 🙂

The significance is Excel 2013 – powerpivot and Powerview so everyone is now BI capable. This really offers a Data Democracy for every Excel user – we will all be data citizens.

Since powerpivot and powerview are natively into the product, everyone can use their data to make decisions, and even change the world around them. Look at the great work that the San Francisco and London data citizens are doing with their data. Excel can really help everyone – not just business users – to grab their data and use it to empower themselves.

To summarise, these announcements offer a full, rounded big picture – it will accelerate insights for all data. Big data crunching. Rapid, real time insights.This is the power of analytics, and means that the Microsoft Data Platform offers end-to-end from transactions right through to the business users and the insights that they can gain from their data. It’s not enough simply to own data; you have to use it. The new announcements from SQL Server are empowering for business users as well as technical DBAs.

New capabilities = in memory, big data, business analytics. It’s all there. Take a look at the earlier links, and watch the keynote from the SQLPass website.  

SQL Server Data Tools – September 2012 Update and updated Power Tools are released!

The SSDT team’s continued commitment to delivering regular, frequent releases of SSDT to the web has shown itself once again! There’s a new download available to you, and you can get it here.
Also – the SSDT team have produced new Power Tools. The full blog list is here. The purpose of the current blog is to release this information to Business Intelligence people, in the hope that some of this news will be relevant and provoke some interest amongst BI people to take a look! The full SSDT Team Blog is here
So what is all the latest news for Business Intelligence people? Here is a summary of the main topics:
Visual Studio 2012
 – this release of SSDT supports the Visual Studio 2012 shell.  
 – contains several bug fixes to the SSDT version that shipped in Visual Studio 2012
 – release can be applied  as an update on top of Visual Studio 2012 Professional, Premium, and Ultimate Editions

Updated SSDT Power Tools
The September 2012 update is compatible with updated Power Tools. Take a look here:

 
 These updated Power Tools rely on the September 2012 update, so make sure that you install it first! 
 

LocalDB Configuration Improvements
  – easier to create a new single localdb instance called Projects to host all the project debug databases.  
– enhancement to SQL Server Object Explorer by surfacing the default localdb instance and enabling the deletion of localdb instances.
 
Support for ANSI_NULLS and QUOTED_IDENIFIER Properties
  – now, SSDT support database objects with non-standard ANSI_NULLS or QUOTED_IDENTIFIER properties on project and script files.
I hope that helps – so head on over to the SSDT Team Site and take a look at the September 2012 update and the updated Power Tools.
Enjoy!
Jen

SQL Server 2008 R2 Service Pack 2 (SP2) is Released!

Note the release of SQL Server 2008 R2 Service Pack 2 (SP2). Both the Service Pack and Feature Pack updates are available for download on the Microsoft Download Center. 
This a free upgrade, so please head over to the Download Center for a look.
This particular pack is interesting because it contains a few key community Reporting Services requests

  • Reporting Services Charts Maybe Zoomed & Cropped
    Customers using Reporting Services on Windows 7 may occasionally find charts are zoomed in and cropped. To work around the issue some customers set ImageConsolidation to false.This is tackled in the Service Pack
  • Batch Containing Alter Table not Cached
    In certain situations with batch files containing the alter table command, the entire batch file is not cached.
  • Collapsing Cells or Rows, If Hidden Render Incorrectly
  • Some customers who have hidden rows in their Reporting Services reports may have noticed rendering issues when cells or rows are collapsed. When writing a hidden row, the Style attribute is opened to write a height attribute. If the attribute is empty and the width should not be zero.