Data Science

# Data Science VC follow up: Intro to R and Statistics for the Rookie Part 1

I’ve decided to hold a five week introductory statistics and R course. Here, I am sharing the slide deck and the code. The video will go up on our YouTube Data Science Virtual Chapter channel, which is accessible from here.

In the first week, we talked about the relationship between statistics and data visualisation, and how it is extremely useful to have a good grounding in both topics. The slides are here, followed by the code:

The R code can be copied and pasted into your RStudio file:

data()

# Let’s look at the data
# This command tells you the metadata. What does R see, when it sees ‘iris’?
str(iris)

# What are the attributes?
attributes(iris)

# Let’s see more of the data
iris

class(iris)
# A data frame has columns which can have different types.
# The column names and types constitute the schema.
# how do we know what is in our data frame?

# Column Names
colnames(iris)

# how can we see data in one of the columns?
iris\$Petal.Length
# or we could also use iris[,3] to get the same column data.

iris[,3]
# of course, we want to visualise the data.
# Let’s do a simple scatter plot.

# How can we see the first five rows?
iris[1:5,]

# how can we see the Petal Length of the first 5 rows?
iris[1:5, “Petal.Length”]

# This shows us some of the descriptive statistics of each variable
summary(iris)

table(iris\$Species)

# Let’s have some dataviz fun!
plot(iris\$Petal.Length, iris\$Petal.Width, main=”Anderson’s Iris Data”)
# You can now see the plot appear in the right hand side frame of RStudio.
# we can make it slightly more interesting
plot(iris\$Petal.Length, iris\$Petal.Width, pch=23, bg=c(“orange”, “blue”, “green”) [unclass(iris\$Species)], main=”Anderson’s Iris Data”)

# we can make it even more interesting
pairs(iris[1:4], main = “Anderson’s Iris Data”, pch = 23, bg = c(“orange”, “green”, “blue”)[unclass(iris\$Species)])
# pie charts!
pie(table(iris\$Species))

# ooh, 3D!

library(scatterplot3d)
scatterplot3d(iris\$Petal.Width, iris\$Sepal.Length, iris\$Sepal.Width)

# ooh, even more 3D!
library(rgl)
plot3d(iris\$Petal.Width, iris\$Sepal.Length, iris\$Sepal.Width)

savehistory(“~/Topic 1 Getting familiar with R A.Rhistory”)

## 8 thoughts on “Data Science VC follow up: Intro to R and Statistics for the Rookie Part 1”

1. Yibo says:

Hi Jen, Thanks for sharing the code. It might worth documenting the version of R you were using for this demo and for people who has not attend this session, they would not know that you have to install package first before referencing them using library command – Install.Packages(“yourpackage”). By the way, “bg=” argument never worked for me in the script, it might to do with my version of R (3.1.2). Do you have any idea?

1. Yibo,
Thank you for your comment. The blog is an accompaniment to the video series. It is not a replacement. I put a lot of time and effort into creating the video, and I hope that people will appreciate the effort.
Regards,
Jen

2. Vince Neville says:

Hi Jen,

I enjoyed watching your Introduction to R and Statistics for the Rookie – five week series part 1 or 5 on your You Tube site.

Is part 2 and 3 available on You Tube as well?

I could not locate either of those 2 sessions.

Thanks,

Vince Neville

1. hi there, yes, I’ll do it as soon as I can. I’ve had a small operation and I’m still recovering, but I haven’t been very well 😦 I will post everything up as soon as I can, probably over the weekend. Thank you for your interest!

1. Vince Neville says:

Get well soon!

3. Marshall Dixon says:

Hi Jen, thanks for posting your notes here on the site. I’m looking forward to learning more and sharing what I find. Part 3 and 4 where can these be found?
Thank you.

This site uses Akismet to reduce spam. Learn how your comment data is processed.