R Basic Tutorial

Basic data structures in R

Posted by Yingshan Li on February 8, 2020

Introducction

R is one of the most wonderful language that has been larged used in statistical analysis and generating high quality figures. It’s totally free and open source, which is the most advantage over SAS. Comparing to other programming languages like C, C++, java, perl etc., R is very easy to learn. The other great advantage is that Rstudio, the graphical interface of R, is well compatible in both windows and Mac. Without further ado, let’s get started to learn some basics of R.

Quick start

Installing R is super easy. If you are using Mac, R is alredy installed. For Windows users, please go to R install to download the package. In this tutorial, we are gonna use the R graphical interface version Rstudio. For installing, please visit Rstudio to download the free Rstudio and install it just like how you install other softwares.

Data types

Scalar

Scalar is the most basic data type in R. A scalar only has one element, which can be “Integer: 8”, “Numeric: 0.8”, “Character: great”, “Logical: True”, “Complex: 1+2i” and “Raw”.

Creation

number <- 8
string <- "Hello world"

Operations

s1 <- "Hello"
s2 <- "world"
s <- paste(s1, s2)

n1 <- 2
n2 <- 4
n <- n1+n2

Vector

While a scalar conly have one elements, a vector have can have one or multiple elements. You just comebine them together with c(). Usually the the types of the elements in a vector, eithwe numeric or chacter. If the types of the elements in the vector are different, they will be converted to the same data type.

Creation

v1 <- c(n1, n2, s1, s2)	# all elements in vector v1, including n1 and n2, are converted to characters

v2 <- c(n1, n2)	# all elements in vector v1 are still numeric

Operations

v1[3]	#access the third element of vector v1

List

list is very similar to vector but won’t convert different types of elements to the same data type.

Creation

list1 <- list(n1, n2, s1, s2, v1)

Operations

list1[[2]]
list1[[5]][2]

Matrix

A matric is just a two dimmentioanl vector. Usually created by the format of a <- matrix(vector, nrow= , ncol= , byrow=TRUE/FALSE, dimnames=list(rnames, cnames))

Creation

m <- matrix(c(1, 2, 3, 4, 5, 6), nrow=2, ncol=3, byrow=TRUE, dimnames = list(c('a1', 'a2'), c('b1', 'b2', 'b3')))

Operations

m[1,]	#access to the first row 
m[,2]	#access to the second collumn
m[2, 3] #access to a sigle element on the second row and third column

Array

An Array is pretty much a matrix but arrays can be multi-dimmentional. For bioinformatic analysis, we won’r use array for most of the time. In order not to confuse you, I’m not gonna give more details about array here. If you are very interested, please look up more in other places.

Data Frames

A data frame is similar to a matrix but can contain different types of data. Usually created with the format a <- data.frame(col1, col2, col3....)

Creation

df <- data.frame(
   gender = c("Male", "Male","Female"), 
   height = c(152, 171.5, 165), 
   weight = c(81,93, 78),
   Age = c(42,38,26)
)

Operations

df[2, ]	#access to the second row
df[, 3]	#access to the third column
df[2, 3]	#access to a sigle element on the second row and third column
df["gender"]	#access to the gender column
df$gender	#the same as above

Fator

Factors in R represent the catagories of data.

Creation

sex <- c("male", "female", "female", "female")
sex <- factor(sex, levels=c("male", "female")) ##vector sex is catogarized into male or female