Ada Lovelace Blog: In memory of those who didn’t make it

The Ada Lovelace day is designed to celebrate women’s achievements in science and technology. Amidst all of the celebrations, there should also be pause for those women who didn’t achieve their potential.
One woman whom I hugely admire is Sophie Scholl, a 21 year old Biology student, who was murdered by Nazis in February 1943. Although she is a legend in her native Germany, it’s not clear that she’s well known outside of Germany. By writing this blog, I hope that she will inspire you as well.
Based on her central Christian conscience, Sophie Scholl made her own peaceful protest against Nazi tyranny and the persecution of Jews. Sophie and her brother Hans were key members of a movement known as the ‘White Rose’, which distributed leaflets in Munich in 1943, which opposed the Nazi regime.
The Scholl family had experience of Gestapo attention in their life, due to punishments received by Hans during his membership of the Hitler Youth movement. Sophie would have been well aware of the risks that she was taking. Whilst distributing the sixth series of leaflets, Sophie was captured and questioned by the Gestapo. This peaceful protest led to Sophie’s conviction for treason against the Nazi state, and her subsequent execution by guillotine, at the age of 21.
Sophie is a heroine of the 20th century; her absolute conviction of her beliefs led her to action. Her faith led her to believe that ‘somebody… had to make a start’ against the Nazi regime.  And make a start she did. After her death, the Allied Forces received a smuggled copy of the sixth leaflet, and went on to drop millions of copies of the leaflet all over Germany.
The fact remains that such a young person, living in such a repressing regime as the Nazi regime, really made a difference to the world around her. This contribution wasn’t in science or technology, simply because she was executed before she had the chance to finish her studies. Sophie’s contribution is truly humbling, and her act of sacrifice is probably one of the most significant of the 20th Century.
Amongst all of the shame associated with Germany’s Nazi past, it is important not to forget her sacrifice. And I will leave the last word to Sophie: “How can we expect righteousness to prevail when there is hardly anyone willing to give himself up individually to a righteous cause. Such a fine, sunny day, and I have to go, but what does my death matter, if through us thousands of people are awakened and stirred to action?”

Adventures in SQL: SP_MSFOREACHTABLE to find the largest table

When exploring a SQL Server database for the first time, it’s always useful to identify the most populated tables first because they are very likely to be the most important tables. To do this in a situation where there are tight time-scales, it’s possible to use some undocumented SQL Server procedures to quickly produce a list of the tables which have the most rows.
Before proceeding, however, it’s wise to note that undocumented stored procedures are not guaranteed by Microsoft. Further, they are not ‘fixed’ so they could change without warning, so use this command at your own discretion, and treat the results with care!
One example of an undocumented SQL Server procedure which has been useful is SP_FOREACHTABLE.  Although it’s not guaranteed, it might be useful for some to try it out as a starting point when exploring data, when it’s useful to find out the most populated tables in your new unexplored database.
In this example, a new table called TableRowCount will be created. This table will hold the results of the execution of SP_FOREACHTABLE.  To assist you, the SQL to do this is given below:

/****** Check if table TableRowCount exists already, and if so, delete it    ******/ IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TableRowCount]’) AND type in (N’U’))
DROP TABLE [dbo].[TableRowCount]
/****** Create table TableRowCount                                                         ******/
CREATE TABLE [TableRowCount](
  TableName sysname,
  [TableRowCount] int )

Once the table is created, the SP_FOREACHTABLE procedure is executed as follows:

/****** Execute stored procedure, and save results in table TableRowCount ******/
EXEC sp_MSForEachTable ‘INSERT [TableRowCount](TableName, [TableRowCount]) SELECT ”?”, COUNT(*) FROM ?’

Then, it’s possible to retrieve the results of this command from the TableRowCount table, as follows:

/****** Display the results from table TableRowCount, in descending order ******/

select * from TableRowCount
order by TableRowCount.TableRowCount desc

The above statement will sort the data so that the table with the highest number of rows will appear at the top. It’s a guess that the most populated table is an important one, but it’s not a bad guess all the same!

Adventures in SQL: Using ASCII to remove carriage returns in user input

A current piece of work involves migrating data from a source system, where there is a lot of user input, into a target system.  The data migration process can always be tricky when user-generated input is involved, since the process involves formalising and sanitising this fluid, informal user input into a formal structure. The informal nature of user input can make this process an interesting challenge!
One recent project involved taking data from a source system and inserting it into a well-known target database, which is a well-known timesheet system. The target system takes comma separated value (CSV) files and imports the data using a proprietary engine.  The data is then propagated to its own underlying SQL Server database.
The CSV file production strategy has involved using SSIS 2008 to extract data from the source system, and place the data into the formal structure of the CSV files. One problem area, as expected, are free-text fields, where users can type in whatever they like!
One issue for CSV file production is carriage returns in free-text fields. SSIS sees the carriage returns, and behaves as its been told to do: it starts a new line.  This can be seen in the Flat File Connection Editor below:

Carriage Returns in Flat File Editor

The {CR}{LF} are control characters which denote ‘carriage return’ and ‘line feed’, respectively. When SSIS sees these control characters in the incoming text field, it naturally believes that this means it should create a new line in the CSV file: which it then goes and undertakes this activity. However, this causes an issue for the CSV file format overall, because the next line will only have one column in it, and will not be correctly formatted for the data import to work properly.
In order to resolve this issue, it is possible to use lower-level constructs in order to strip out any erroneous carriage returns in the user input. The SQL CHAR command is a string function that converts an INT ASCII code to a specific character. Here, it is used to insert control characters that we see in the earlier image, into character strings. Here is an example, which looks for a carriage return (CHAR(13)) and line feed (CHAR(10)):

select REPLACE(cast(w.userentereddetail as nvarchar(4000)), CHAR(13) + CHAR(10), ‘  ‘) as UserComments from tbl w

This SQL Statement uses a combination of CHAR and REPLACE to find any carriage returns in the ‘userentereddetail’ column, and will replace them with a space. This means that any carriage returns in the user input field are captured and replaced so that they will not interfere with the CSV production by SSIS.
To summarise, perhaps I’m showing my age by talking about ASCII. However, these little tidbits can be useful to know!