Guess who is appearing in Joseph Sirosh’s PASS Keynote?

This girl! I am super excited and please allow me to have one little SQUUEEEEEEE! before I tell you what’s happening. Now, this is a lifetime achievement for me, and I cannot begin to tell you how absolutely and deeply honoured I am. I am still in shock!

I am working really hard on my demo and….. I am not going to tell you what it is. You’ll have to watch it. Ok, enough about me and all I’ll say is two things: it’s something that’s never been done at PASS Summit before and secondly, watch the keynote because there may be some discussion about….. I can’t tell you what… only that, it’s a must-watch, must-see, must do keynote event.

We are in a new world of Data and Joseph Sirosh and the team are leading the way. Watching the keynote will mean that you get the news as it happens, and it will help you to keep up with the changes. I do have some news about Dr David DeWitt’s Day Two keynote… so keep watching this space. Today I’d like to talk about the Day One keynote with the brilliant Joseph Sirosh, CVP of Microsoft’s Data Group.

Now, if you haven’t seen Joseph Sirosh present before, then you should. I’ve put some of his earlier sessions here and I recommend that you watch them.

Ignite Conference Session

MLDS Atlanta 2016 Keynote

I hear you asking… what am I doing in it? I’m keeping it a surprise! Well, if you read my earlier blog, you’ll know I transitioned from Artificial Intelligence into Business Intelligence and now I do a hybrid of AI and BI. As a Business Intelligence professional, my customers will ask me for advice when they can’t get the data that they want. Over the past few years, the ‘answer’ to their question has gone far, far beyond the usual on-premise SQL Server, Analysis Services, SSRS combo.

We are now in a new world of data. Join in the fun!

Customers sense that there is a new world of data. The ‘answer’ to the question Can you please help me with my data?‘ is complex, varied and it’s very much aimed at cost sensitivities, too. Often, customers struggle with data because they now have a Big Data problem, or a storage problem, or a data visualisation access problem. Azure is very neat because it can cope with all of these issues. Now, my projects are Business Intelligence and Business Analytics projects… but they are also ‘move data to the cloud’ projects in disguise, and that’s in response to the customer need. So if you are Business Intelligence professional, get enthusiastic about the cloud because it really empowers you with a new generation of exciting things you can do to please your users and data consumers.

As a BI or an analytics professional, cloud makes data more interesting and exciting. It means you can have a lot more data, in more shapes and sizes and access it in different ways. It also means that you can focus on what you are good at, and make your data estate even more interesting by augmenting it with cool features in Azure. For example, you could add in more exciting things such as Apache Tika library as a worker role in Azure to crack through PDFs and do interesting things with the data in there. If you bring it into SSIS, then you can tear it up and down again when you don’t need it.

I’d go as far as to say that, if you are in Business Intelligence at the moment, you will need to learn about cloud sooner or later. Eventually, you’re going to run into Big Data issues. Alternatively, your end consumers are going to want their data on a mobile device, and you will want easy solutions to deliver it to them. Customers are interested in analytics and the new world of data and you will need to hop on the Azure bus to be a part of it.

The truth is; Joseph Sirosh’s keynotes always contain amazing demos. (No pressure, Jen, no pressure….. ) Now, it’s important to note that these demos are not ‘smoke and mirrors’….

The future is here, now. You can have this technology too.

It doesn’t take much to get started, and it’s not too far removed from what you have in your organisation. AzureML and Power BI have literally hundreds of examples. I learned AzureML looking at the following book by Wee-Hyong Tok and others, so why not download a free book sample?

https://read.amazon.co.uk/kp/card?asin=B00MBL261W&preview=inline&linkCode=kpe&ref_=cm_sw_r_kb_dp_c54ayb2VHWST4

How do you proceed? Well, why not try a little homespun POC with some of your own data to learn about it, and then show your boss. I don’t know about you but I learn by breaking things, and I break things all the time when I’m  learning. You could download some Power BI workbooks, use the sample data and then try to recreate them, for example. Or, why not look at the community R Gallery and try to play with the scripts. you broke something? no problem! Just download a fresh copy and try again. You’ll get further next time.

I hope to see you at the PASS keynote! To register, click here: http://www.sqlpass.org/summit/2016/Sessions/Keynotes.aspx 

SQLSaturday Edinburgh sponsors Melissa Data have some free giveaways

SQLSaturday Edinburgh sponsors Melissa Data are announcing some free Data Quality community editions and giveaways coinciding with SQLPass Summit 2013 and I thought it was worth mentioning.

Melissa Data Features Community Editions of Data Quality Components for SQL Server at PASS Summit

Advanced Formatting and Address Parsing Now Free to SSIS Data Integration Developers

Rancho Santa Margarita, CA, and Charlotte, NC, PASS Summit Booth #221 – October 10, 2013 – Melissa Data, a leading provider of contact data quality and integration solutions, is highlighting newly available, free Community Editions of its flagship Data Quality Components for SQL Server Integration Services (SSIS) at PASS Summit 2013 from booth #221. Community Editions include Contact Verify and MatchUp components, and enable data integration developers with ready access to sophisticated address parsing, formatting, and matching.

“Creating parsing engines is often a complex process for developers, however Melissa Data’s verification tools have already solved the challenge,” said Bud Walker, director of data quality solutions at Melissa Data. “Our Community Editions reduce the time and resources required to build a parser from start to finish, quickly enabling parsing as a foundational capability that allows data to be transformed in many other ways.”
Contact Verify CE enables address, phone and name parsing, and email syntax correction. MatchUp CE provides deduplication of up to 50,000 records using nine basic matchcodes. Melissa Data’s Community Editions of Data Quality Components for SSIS can be downloaded for free with no license required.
Community Editions are part of Melissa Data’s premier collection of data quality tools for all editions of SQL Server 2012, including Enterprise, Business Intelligence, and Standard, via SQL Server Integration Services (SSIS) as well as the SQL Server 2012 Enterprise Edition’s Data Quality Services.
For more information about Melissa Data’s Community Editions of DQ Components for SSIS, visit Melissa Data at PASS Summit (Booth #221), click on www.MelissaData.com/ces, or call 1-800-MELISSA (635-4772).

Connecting SSIS and Oracle: Issue and Resolution #1

Connecting from SQL Server SSIS to Oracle can be problematic, to say the least! I’m logging them on my blog as I find them. One error message that turned up recently was the following:

 cannot resolve connection

This can mean that there is a line missing from the Oracle file called SQLNET.ORA. For some older implementations of Oracle, the Oracle TNSNAMES.ORA file may have the connections defined as .world rather than simply HOST.

This is resolved by ensuring that the SQLNET.ora file contains the appropriate following statement:

names.default_domain = world

Alternatively you can remember to add ‘.world’ to all of your connection strings. The easiest thing, however, is to ensure that the SQLNET.ora file has the ‘.world’ information added.

Hope that helps!

SSIS and Oracle: challenge of ‘Unable to extend temp segment’ in Oracle

If SSIS is being used to retrieve large amounts of data from an Oracle database, the SSIS package may fail with the following error message:
ORA-01652: unable to extend temp segment by 256 in tablespace
This error basically means that Oracle could not extend the temporary segment for the required number of blocks in the user’s allocated tablespace.  There are a few solutions to this issue:
  • Increase the users temp tablespace
  • Although it’s an Oracle error, it may show up structural design issues with the package itself. Thus, the SSIS package could be changed so that, instead of one large query, the Oracle database is serving up smaller chunks of data at a time. This may mean ‘daisy-chaining’ a number of components, who each serve up smaller parts of the larger query that was initially being executed.
As always, it’s best to have a chat with your friendly Oracle dba owner of the source database for their thoughts.

Add to Technorati Favorites

Creating Oracle Tables using SSIS

It’s possible to use SSIS to create a new Oracle table for you. However, there are a few ‘gotchas’ that it’s necessary to look out for.
It’s possible to use the ‘OLE DB Destination’ adapter to insert data into Oracle.  Thus, if a new Oracle table is required, it’s possible to select the destination Oracle database, and use the ‘New’ button to generate code for you automatically. This will bring up a text editor, which may have some content similar to the following snippet:

CREATE TABLE “MyTable” (
“ETLLoadID” VARCHAR2(255),
“Column1” NUMERIC(10),
“Column2” NUMERIC(10)
)

This will create a table called “MyTable” for you, which initially sounds like good news. When the SSIS package runs, it will deliver the data into the destination Oracle table.
Don’t get comfort from those green SSIS boxes, however. It’s important to be careful with those quotation marks.
Thus, when you try to select from the Oracle table, using the following statement, it won’t work:
select ETLLoadID from MyTable
This does not work, and generates the following error message:
ORA-00904: Invalid Column Name
Then, the select statement does not work; it needs quotation marks around it.  This will work better:
select “ETLLoadID” from MyTable
In order to find out exactly how the table has been implemented, it’s possible to use the DESCRIBE command. This is like SP_HELP, and provides you with data you need for the table.
Instead of using SSIS to generate tables, I now create my own tables in Oracle since it gives me the control I need, and means that I don’t have to worry about quotation marks. If you need a useful link to help you with the syntax, here it is.
I hope that helps!

Add to Technorati Favorites

SSIS and Intelligent Laziness: SQL Server Package Configurations

Porting and deploying SSIS 2008 packages as part of the software development life cycle can be a tricky process! Applying the principles of ‘Intelligent Laziness’, here are some thoughts around the best package configuration method to use which results in the minimum work with the maximum long-term results! There are two main issues regarding porting packages from development through to the production environment. One issue is ensuring that the package runs in the new environment, without any problems. The second issue in deploying packages to production involves ensuring that the deployed package picks up the correct production settings: not the old development settings.
Fortunately, there are different ways in SSIS of achieving the maximum portability in deploying packages throughout the software engineering cycle from development to production. The SSIS package is a collection of objects which depend on the environment. This can make deployment more complex than it perhaps needs to be for two reasons, which I call the ‘between environment’ and ‘within environment’ issues.
Between environment issue: for example, the package may take in data from one system, or perhaps load data from a file. This may change from environment to environment at the same point in time.
Within environment issue: there is the practical consideration is that the same environment will change over time. For example, security access may change, which means that your package login to source systems may need to be amended.
In order to address these issues, it’s important to be able to modify a package with the minimum disruption to the package operation; preferably, to make these changes in a way that is insulated from the package itself. SSIS offers a way of doing this, which is called Package Configurations. These have two important aspects:
Package Path to the property – this is like an address for each object in the package. An example may include \Package.connections[ConnectionName].ConnectionString, which may hold the connection string for an Oracle source database, for example.
Property Value – the actual value when the address is redeemed e.g. servername, user name
There are many different ways to configure Package Configurations, but my preferred method is the SQL Package configuration method. Essentially, SQL Server table configurations mean that the package configuration values are stored in a SQL Server table, which is de-coupled from the package itself. The packages only need to be deployed once. Each time the package executes, it sucks in the configuration parameters from SQL Server table. The SQL Server table does not even have to be stored on the SSIS server; it can be stored on a dedicated server. This is my preferred method for a number of reasons:
Flexible, central administration – that wholesale changes can be adopted much more easily than stored as XML, which would need re-deployed each time.
Transparency – by allowing support and development teams to be able to view the configuration table, then it’s possible to make the packages easier to administer. In the case where a package is being deployed, it’s possible for the configurations to be double-checked by everyone involved in the deployment process.
Managing ‘within environment’ changes – For example, let’s take the case where the package login needs to be changed from one value to another, or a password needs to be changed. By simply running an update statement before the next iteration of package execution, the password value would be simply retrieved from the central SQL table at the point of execution. In the situation where there are a number of packages that require an amended login, then this one simple change would mean that all of the packages would be automatically updated.
Managing ‘between environment’ changes – When taking a package from development through to production, it is possible to execute update statements to so that the package is pointing at the production settings rather than the development settings. This can be done as part of the deployment process. Further, in releasing an update script, the deployment personnel can double-check that the settings are correct.
Connection Manager changes – It’s possible to use the table to store connection manager information. This is useful when moving the SSIS package through the software engineering cycle from dev onwards.
Configuration Re-use – since the configurations are centrally stored, it is possible to re-use existing configurations. In reality, you probably end up building different packages from the same data stores; so why not re-use? This may save you some time in the longer run.
Package configurations can be your friend! They can really help developers to stick to software development principles of moving through dev/sys/uat/production. Personally, I don’t like using a production environment as a test environment and I would never recommend it. The usual ‘we don’t have the money to have separate environments’ excuse no longer washes, with the availability of VPCs and Hyper-Vs. To summarise, SQL Server package configurations can assist you to manage SSIS packages centrally with the minimum work involved in the management process.

Adventures in SQL: Using ASCII to remove carriage returns in user input

A current piece of work involves migrating data from a source system, where there is a lot of user input, into a target system.  The data migration process can always be tricky when user-generated input is involved, since the process involves formalising and sanitising this fluid, informal user input into a formal structure. The informal nature of user input can make this process an interesting challenge!
One recent project involved taking data from a source system and inserting it into a well-known target database, which is a well-known timesheet system. The target system takes comma separated value (CSV) files and imports the data using a proprietary engine.  The data is then propagated to its own underlying SQL Server database.
The CSV file production strategy has involved using SSIS 2008 to extract data from the source system, and place the data into the formal structure of the CSV files. One problem area, as expected, are free-text fields, where users can type in whatever they like!
One issue for CSV file production is carriage returns in free-text fields. SSIS sees the carriage returns, and behaves as its been told to do: it starts a new line.  This can be seen in the Flat File Connection Editor below:

Carriage Returns in Flat File Editor

The {CR}{LF} are control characters which denote ‘carriage return’ and ‘line feed’, respectively. When SSIS sees these control characters in the incoming text field, it naturally believes that this means it should create a new line in the CSV file: which it then goes and undertakes this activity. However, this causes an issue for the CSV file format overall, because the next line will only have one column in it, and will not be correctly formatted for the data import to work properly.
In order to resolve this issue, it is possible to use lower-level constructs in order to strip out any erroneous carriage returns in the user input. The SQL CHAR command is a string function that converts an INT ASCII code to a specific character. Here, it is used to insert control characters that we see in the earlier image, into character strings. Here is an example, which looks for a carriage return (CHAR(13)) and line feed (CHAR(10)):

select REPLACE(cast(w.userentereddetail as nvarchar(4000)), CHAR(13) + CHAR(10), ‘  ‘) as UserComments from tbl w

This SQL Statement uses a combination of CHAR and REPLACE to find any carriage returns in the ‘userentereddetail’ column, and will replace them with a space. This means that any carriage returns in the user input field are captured and replaced so that they will not interfere with the CSV production by SSIS.
To summarise, perhaps I’m showing my age by talking about ASCII. However, these little tidbits can be useful to know!