Representing data about the iPad

The current blog will take three different ways of representing the same data set, in order to see how it can be done simply and clearly – or not so clearly. I have taken some samples, and reworked them as a progression throughout this blog.

Although I am discussing the iPad here, this is not a preview about my iPad and Mobile Business intelligence sessions which I’m delivering at SQLBits session in October, or my User Group sessions in Leeds and Surrey this year; however, obviously the iPad is very much in my mind, hence the perpendicular topic of this blog!

The dataset is interesting because it aims to show the impact of the iPad announcement on notebook sales. This study was conducted by NPD, Morgan Stanley Research. CNN Money has written a short article on the impact of the iPad on netbook sales, which proposes that the iPad is at least ‘partially’ responsible for the decline in netbook sales. The rather dramatic bar chart, which underlines this point, is given here:
US-notebook1
There are a few issues with the bar chart:

 – The axis doesn’t go from 0 – 100%, which I would expect, given that it is supposed to show percentages. This skews the results slightly; for example, the 70% seems higher.
– 3D gradient issues don’t add anything. Sometimes 3D can make an image look more ‘pretty’. Here, the 3D does not add anything ‘pretty’ or enhance anything about the message of the data
– it’s not clear why the data has been represented as distinct categories when time is continuous rather than discrete
– the big pink arrow shouldn’t have been necessary; the graphic should have been enough.
– there is nothing to make the negative value stand out, or to distinguish it in any way.

There have been other examples of the same data, re-visualised. Here is an example from a wonderful infographic, which has been completed by the Focus Group. I have taken an excerpt of it here since the whole infographic is not the focus of this blog:

iPad and Notebook sales by the Focus Group

The above infographic solves some of the issues of the earlier version, which was reproduced by CNN money.

– There is no 3D
– The big arrows have gone

However, although it is visually appealing, it does repeat some of the earlier issues found in the CNN money chart, since the scale still does not reach 100% on the Y axis. Further, it also introduces some new issues:

– The black background might be visually appealing, but as a ‘best practice’, a white background is better. This allows the representation of the data to dominate the scene, not the background or other non-necessary items.
– hatched lines replace the arrows, to denote the time of the announcement of the iPad and the actual release of the iPad. This is an issue because it is slightly jarring to the eye.
– the month timeline isn’t evenly marked in terms of months; it is therefore difficult to ascertain if the data is skewed horizontally in any way.

In order to improve these representations of the data, I have used Tableau in order to create a simple line graph. This was all that was needed in order to get the message across, without skewing it or obscuring it in any way. Here is my example below, which can also be found on the Tableau Public website:

iPad and Notebook Sales

I have removed the issues found in the earlier visualisations and added some further enhancements:

The negative growth percentage has been highlighted with red colouring
added in clean annotations which do not obscure other parts of the data visualisation
ensured that the Y-axis shows 100% so that the data is not skewed
used a line graph since the X-axis is continuous, not discrete
removed the black background to emphasise the components of the data that provide the message of the data

Although the data visualisation has been improved, there are still contextual answers which the graph cannot answer:

– what about the impact of the iPhone, or other tablets?
– what about the impact of the time of year e.g. post-Christmas sales?
– what about the impact of the impending recession?

Therefore, the initial analysis as described by CNN money simply provided a ‘headline’ message, and further analysis would need to be conducted in order to answer the question more fully. That said, a proper visualisation of the data is a useful tool towards getting the ‘bigger picture’ right, as well as the ‘smaller picture’.

I hope that this was interesting, and look forward to your comments.
Jen x

11 thoughts on “Representing data about the iPad

  1. Hi Jen – I'm not going to disagree with you on the 3D not adding any value on this one 😉 However, I don't agree that extending the y axis to 100% improves the graph – you're just adding more white space in which no data lies, artificially reducing distortions in the shape of the curve. It may not be “chart junk”, but it's “chart void”, which I would regard as an equal sin…. you should seek to maximise the extent of useful data on the page rather than squishing it up.

    I'm not sure I understand your justification that “The axis doesn't go from 0 – 100%, which I would expect, given that it is supposed to show percentages.”
    Why should percentage values always only go from 0 – 100%?
    The percentage in question is year on year growth, which in this dataset has a maximum value of 70%, but could easily have had values of 110%, or 200%, or more. Would you still have capped the scale at 100% in those cases?
    And if you feel there is an intrinsic reason why a graph should only show percentage values from 0 – 100 (which I don't agree with anyway), why did you not also extent the y axis downwards from 0 to -100% rather than stopping at -10%?

  2. Hi Tanoshimi,
    Thank you for your thoughtful comments.
    Y-axis is an interesting debate, and 'Now You See It' by Stephen Few contains some interesting thoughts.
    IMHO if you change the axis from what people expect, then that's ok as long as you point it out and emphasise it.
    I tend to stick to what I see as default if (a) the data is public or (b) the data isn't something that the consumer is very familiar with, particularly.
    In the other graphs, the peak at 70% could be perceived to be higher than it actually is.

    Remember that not everyone will look at the axis, and could see the 70% as slightly exaggerated because it is so near the top.
    For me, it's about the truthfulness of the data visualisation – in data visualisation there are always trade-offs. Sometimes it is right to change the Y-axis, sometimes not. Since this was public data for everyone, I couldn't assume that everyone would see the axis. if you read the original CNN article, the main thrust is that sales of netbooks plummeted, and I felt that, by capping the axis at 70%, the 'plummet' was emphasised. The point of my blog is that the dataviz isn't enough since there are contextual questions over a more complete interpretation, e.g. what's the impact of other variables? I wanted to show a more realistic percentage around the sales 'plummet' since chopping the axis can skew the representation, which would underline the ideas of the original article, rather than take a fresh look.
    Finally i wanted to thank you for your comments. I like people questioning or challenging me on the basis of what I say or write. TBH it helps me to think, and get better at what I do, and to see different sides of the same thing – which is what dataviz can help towards.
    What I liked about your questions is that you had valid points to make, and I wanted to thank you for your thoughtful reading and for taking the time to write your comments.
    Kind Regards,
    Jen

  3. We agree on the underlying motivation, but perhaps not on the execution 😉

    If the data had been from a set of values that could only have ranged from 0 to 100% (let's say you were plotting the market share of different netbook providers) then I agree with you that, it is sometimes beneficial to extend the scale to 100% as that helps to provide context.

    However, *for this dataset* the y axis covers a potentially infinite range of values. I don't understand why you would arbitrarily choose 100% as the optimum point at which to crop the graph. If anything, this could trick the viewer into thinking that was an absolute maximum value that could be reached.

    All three of the graphs use a linear scale, so the height of the data point for Dec-09 (70%) is always going to be twice as far away from the x axis as the data point for Feb-10 (35%). Any “skewing” that's going on is purely a visual illusion because the Dec-09 value is plotted relatively close to the frame of the graph.

    However, by artifically extending the y-axis to cover values that aren't present in the data, you've shortened the scale of the y axis relative to the x axis. The effect is to give a shallower angle to the entire graph. I would argue that, rather than reducing distortion, you're actually introducing it.

    As an example, suppose you were working for Dell and you wanted to show that sales of netbooks were robust and unaffected by the iPad – you could show this data in a graph in which you'd extended the y-axis up to 10,000% At this scale, the entire range of values would only be perceived as a straight horizontal line, and you could confidently reassure your shareholders that any talk of a “plummet” was idle speculation 😉
    You've not gone to this extreme, but you have gone some way to flattening the graph compared to the previous two examples.

    Anyway – I enjoy the debate and look forward to hearing from (as always 🙂

  4. Hi Jen

    Interesting debate over y axis. My opinion is that it depends on whether the actual value of the % is relevant or not.

    If you're showing % of a population where 100% is meaningful as a maximum then it's important to show the y axis from 0-100.

    In this case however there is no minimum or maximum, and to be honest the actual value of 70% isn't really that meaningful. What's more relevant is the relative hights of each point which shows the change in direction of the line. You could almost get away without having a y axis, although that would be going to far 🙂

    What I think is more misleading is the title used in CNN's chart – 'notebook sales plummeted'. The chart actually shows positive growth in all but the last data point. So surely sales are still increasing not plummeting?!

    The chart should actually be showing unit sales, not % growth, but that wouldn't be as sensationalist.

    Alex

  5. I'm with Tanoshimi and Alex on the Y axis maximum. 100% growth is huge; most plots of growth won't get this high.

    What if the two high values for Nov-09 and Dec-09 are the unusual data points? If these were in line with the neighboring values (30 to 35%), would you still insist on a maximum of 100% which extended 2x higher than the data?

    If we were discussing a portion of a whole quantity, then using a maximum of 100% might be sensible.

  6. Hello Jon,
    First off – I have admired your blog for a long time and I'm honoured you've read mine and taken the time to comment, so I wanted to say 'Thank You' for your time.

    I think I agree with Alex on this one – there is no minimum or maximum, so it is kind of arbitary. I chose 100% because the data was talking percentages and I agree it is difficult to be prescriptive – but that's what makes it interesting! So I wouldn't make this a hard-and-fast rule; I guess I was trying to answer my concern in the original graph that having the high values so close to the ceiling was potentially distorting the dataviz to support the rather sensationalist perspective taken on the data.
    I was just concerned that the outliers might look slightly exaggerated since they were so near the ceiling of the graph, especially since the original graph was aimed at 'shock value', plummetting sales and so on.

    I think that the Tanoshimi comment, and I suppose you're asking the same thing, is 'what if'? In the scenarios you describe, I would probably add in some contextual information – additional commentary, for example – so that it was clear to the data consumer, exactly what they are looking at.
    The graph on its own is pretty bald, and misses out a lot of context that could have been added e.g. iphone sales and so on.
    The original headline took a particular slant and I think that the original issue is more complex than the original yellow and pink (ugh!) bar chart showed.
    Ideally, I would have liked to add in this information as well; and then we could evaluate more easily where the 'cut-off' point is for the Y axis.

    Thank you again everyone. I have been delighted that people are interacting. I am learning whilst contributing and your comments have been very valuable to me.

    Kindest Regards,
    Jen

  7. The interesting things I take from this graph are dramatic effects, and the wonderful ability to make stats show us anything we like.

    The peak of Nov09 and Dec09 are obviously Xmas sales and as such skew the overall look of the graph, but give a great starting point for the 'decline'.

    Also look at what the graph is measuring – Year-on-Year Growth of Sales. What we are seeing is a slowing down of the growth, but essentially sales are still up on last year. It's not until Aug10 do we actually see a decline in notebook sales. But the emphasis and the language around the graph lead us to believe that sales have actually been falling for over 6 months!

    It might be that last year notebook sales went through the roof for some particular reason, and so to maintain growth on that is impossible. Without seeing the 'base', growth figures can often be misleading!

    You can find all kinds of points in the data too. Why does sales grow the month after the announcement? Why point out that sales growth dipped in the month of announcement – it's probably more to do with the post-Xmas lull than anything else.

    None of this is to detract from your points Jen, I just wanted to make the point that the graph has other flaws!!

  8. I have to give Joe's post my full endorsement. The story is really clear and it fits into a narrow blog post without having to squint or zoom. Great work.

Leave a Reply