Introducction
R is one of the most wonderful language that has been larged used in statistical analysis and generating high quality figures. It’s totally free and open source, which is the most advantage over SAS. Comparing to other programming languages like C, C++, java, perl etc., R is very easy to learn. The other great advantage is that Rstudio, the graphical interface of R, is well compatible in both windows and Mac. Without further ado, let’s get started to learn some basics of R.
Quick start
Installing R is super easy. If you are using Mac, R is alredy installed. For Windows users, please go to R install to download the package. In this tutorial, we are gonna use the R graphical interface version Rstudio. For installing, please visit Rstudio to download the free Rstudio and install it just like how you install other softwares.
Data types
Scalar
Scalar is the most basic data type in R. A scalar only has one element, which can be “Integer: 8”, “Numeric: 0.8”, “Character: great”, “Logical: True”, “Complex: 1+2i” and “Raw”.
Creation
number <- 8
string <- "Hello world"
Operations
s1 <- "Hello"
s2 <- "world"
s <- paste(s1, s2)
n1 <- 2
n2 <- 4
n <- n1+n2
Vector
While a scalar conly have one elements, a vector have can have one or multiple elements. You just comebine them together with c(). Usually the the types of the elements in a vector, eithwe numeric or chacter. If the types of the elements in the vector are different, they will be converted to the same data type.
Creation
v1 <- c(n1, n2, s1, s2) # all elements in vector v1, including n1 and n2, are converted to characters
v2 <- c(n1, n2) # all elements in vector v1 are still numeric
Operations
v1[3] #access the third element of vector v1
List
list is very similar to vector but won’t convert different types of elements to the same data type.
Creation
list1 <- list(n1, n2, s1, s2, v1)
Operations
list1[[2]]
list1[[5]][2]
Matrix
A matric is just a two dimmentioanl vector. Usually created by the format of a <- matrix(vector, nrow= , ncol= , byrow=TRUE/FALSE, dimnames=list(rnames, cnames))
Creation
m <- matrix(c(1, 2, 3, 4, 5, 6), nrow=2, ncol=3, byrow=TRUE, dimnames = list(c('a1', 'a2'), c('b1', 'b2', 'b3')))
Operations
m[1,] #access to the first row
m[,2] #access to the second collumn
m[2, 3] #access to a sigle element on the second row and third column
Array
An Array is pretty much a matrix but arrays can be multi-dimmentional. For bioinformatic analysis, we won’r use array for most of the time. In order not to confuse you, I’m not gonna give more details about array here. If you are very interested, please look up more in other places.
Data Frames
A data frame is similar to a matrix but can contain different types of data. Usually created with the format a <- data.frame(col1, col2, col3....)
Creation
df <- data.frame(
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26)
)
Operations
df[2, ] #access to the second row
df[, 3] #access to the third column
df[2, 3] #access to a sigle element on the second row and third column
df["gender"] #access to the gender column
df$gender #the same as above
Fator
Factors in R represent the catagories of data.
Creation
sex <- c("male", "female", "female", "female")
sex <- factor(sex, levels=c("male", "female")) ##vector sex is catogarized into male or female