---
title: "R Basics"
author: "Marcos Sanches"
date: "`r Sys.Date()`"
output:
rmdformats::readthedown:
highlight: kate
---
```{r setup, echo=FALSE, cache=FALSE}
library(knitr)
library(rmdformats)
## Global options
options(max.print="75")
opts_chunk$set(echo=TRUE,
cache=TRUE,
prompt=FALSE,
tidy=TRUE,
comment=NA,
message=FALSE,
warning=FALSE)
opts_knit$set(width=75)
```
# R and R Markdown
Markdown is just a text file with some specific formatting and functions that helps you store your R script very nicely.
**Without Markdown** - You run scripts and they disappear. Or you copy and paste your scripts into Word. It is hard to organize it and even more for someone else to understand your script.
**With Markdown** - You have a nice document where explanatory text and R scripts take turn. You can easily create script, explain it, and interpret the results while doing your analysis. And it will be organized in such a way that someone else will easily understand it. You can also convert it into a webpage or pdf and share easily.
## Knit it!
To see how all this works, lets knit this document!
## Let's keep going...
Today we want you to get a general feeling for how R, RStudio and R Markdown works.
In this R Markdown we have some general basic operations in R that is good to know. But with R and any programming language, it takes time to get a good grasp of it. If you dont have experience with R, you may find using it a bit abstract and hard to understand, but you should get it with some practice and giving yourself some time.
A good thing to do is probably to force yourself to analyze your next dataset in R! It will take time and lots of trial and error, and it might be hard, but at the end of it you will feel like you have a new skill! And you can count on us along the way.
## Markdown
Lets just do some random things today!
Below you have a "chunk", where your R codes go. Anything starting with a "#" is a comment and will not be run by R.
To run codes inside a chunk you can select it and press CTRL + Enter on Windows. When you do that, the code in the chunk will automatically be copied and pasted in the console, and it will run. You could have done that manually, but copying and pasting the code from the chunk to the console yourself, then pressing enter. Or you could type the code directly in the console.
Having all your code in a R Markdown document, with comments and texts, is a good practice in **Reproducible Research**. It means that someone else can rerun what you did, check your code, test it, modify, etc. The analysis you did will be very transparent.
Markdown documents can also be transformed into PDFs, or HTML documents, which can be shared easily. It is also a nice way to produce a report that includes the code used! It is very flexible in terms of formatting texts, although we will not get into that.
### Example
Here we have some script that we will use to exemplify how all this works.
```{r}
459 / 251
# a vector
c(1,2, 3, 4, 5)
# assignifg it to a variable
a <- c(1,2, 3, 4, 5)
# printing a
a
# accessign some of its positions
a[3:5]
# the function 'seq', just a random function
a <- seq(1,100)
# Lets see what it did
a
# another random function, the 'sum'
sum(a)
```
## Getting Help
R Help is not considered awesome, but as you get more famliar with R you will find yourself using it more and more.
R is all about script language and programming, and even expert R users will have to often use the R Help, because nobody knows all commands and all packages.
You will also find it useful to search for help on the internet. The R Documentation is a bit dry and you will often find more comprehensive explanations and examples in the internet.
Here are some example of how to get help.
```{r}
# Here is the function 'seq' again
a<- seq(1,100,by = 2)
a
# What does 'seq' does and how to use it?
? seq
# Here is another example of the 'seq' function used with a new argument 'length.out'
seq(from = 1,to = 100,length.out = 10)
# help for the function 'lm'
? lm
# and for the function 'glm'
help(glm)
```
# R objects
We will now show a few R objects that are important. But you dont need to try to memorize them or anything. It is just for you to start practicing and having a feeling for R.
One thing you can start getting used to is the fact that R is case sensitive. **It matters if you use upper or lower case.**
## Vector
The definition of vectors using "c()" is something you will use often. Here are some examples of numeric and character vector.
Most of the times when we do taks in R we will not just create a numeric vactor, say, but also store it into a variable. If you dont do that, you lose it and it disappears as soon as you run it.
Below we store a numeric vector into a variable x, and a character vector into a variable y.
One you have done that, it is easy to do operations with the vector.
```{r}
# Defining a numeric vector
c(10.4, 5.6, 3.1, 6.4, 21.7)
# Assigning the vector to a variable. Object x is now a numeric object.
x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
x
# The function "class" shows you what type of object x is.
class(x)
# you can also do the assignment like this.
c(10.4, 5.6, 3.1, 6.4, 21.7) -> x
x
# and do operations with x.
1/x
round(1/x,3)
2*x
sum(x)
mean(x)
sd(x)
# y is a character vector
y <- c("a","bcd","j",1:4)
y
class(y)
class(x)
```
## Missing values
In R, missing values is represented by **NA**.
In general, when you import data into R, blanks will be translated into NAs.
```{r}
# a numeric verctor with a missing value at the end.
x <- c(10.4, 5.6, 3.1, 6.4, 21.7, NA)
# R does not know how to calculate the mean if there is a missing value.
mean(x)
# So, you have to tell R to remove missing values (na.rm) before calculating the mean.
mean(x, na.rm = TRUE)
# NA also works for characters.
y <- c("a","bcd","j",1:4,NA)
y
```
## Matrices
Matrix is a generalization of vectors, as it can have two or more dimensions.
You will probably not use matrices a lot in your statistical anlayses, so we will keep this short.
Here we start learning how to access elements in a object, in this case a matrix. How to get the first row or column, or a given element. This is very important, and we will see a bit more of it in the dataframe section below.
```{r}
# Generate a sequence from 1,2,3...100 and ask R to organize it in a matrix.
a <- matrix(seq(1,100), nrow = 10, byrow = TRUE)
a
dim(a)
# Accessing elements of a matrix
a
a[3,]
a[4,6]
a[,6]
a[3:5,]
a*a #element by element product
a%*%a #matrix product
```
## Lists
Lists are also used a lot in R because they are very flexible. A list is basically a collection of objects, which can be of different types.
So, you could have a list that contains a regression output as its first element, its residuals as its second element, and a plot of the residuals as its third element.
We use double square brakets to access list elements.
```{r}
#ceating a list
mylst <- list(name="Fred", wife="Mary", no.children=3,
child.ages=c(4,7,9))
mylst
class(mylst)
# accessing an element by its position
mylst[[3]]
# accessing an element by its name (if there is a name)
mylst$child.ages
# Adding a numeric vector, a character vector a matrix and a list to a list.
list1 <- list(x,y,a,mylst)
list1
```
# Data Frames
If you are doing data analysis, data frames are probably a key object for you.
Usually we will import data into R from other software, like Excel or SPSS. Once imported into R the SPSS dataset will be a "data.frame" object.
Here we create a small data frame object and play a bit with it. In this course we will work a lot with data frames.
```{r}
# creating a toy dataset.
dt <- data.frame(name = c("John","Mary","Paul","Mark","Michelle","Joanna"),
id = seq(1,6),
age = c(28,31,12,21,41,44),
treat = rep(c("Treatment","Control"), each = 3),
PHQ9 = round(rnorm(6,16,4),0))
dt
class(dt) # what type of object id dt?
names(dt) # what are the name of the elements within dt?
str(dt) # a description of dt
dt$name # getting the name component (variable) of dt
dt$name[4] # getting the fourth element of the 'name' element of dt.
dt[1,] # first line, all columns
dt[,] # all lines, all columns
dt[,3:5] # all lines, columns 3 to 5
dt[,c(3,5)] # all lines, columns 3 and 5
dt[,c("age","PHQ9")] # all lines, columns age and PHQ9
dt$age # column age
dt$sqrt_age = sqrt(dt$age) # New columns with square root of age
dt
dt[dt$age < 18,] # Filtering age < 18 and keeping all columns
dt$age[dt$age < 18] <- NA # missing for age, if age < 18
is.na(dt$age) # verify if age is missing
dt$PHQ9[is.na(dt$age)] <- NA # missing for PHQ9, if age < 18
dt
```
# End!
That is it for today! Hope you got a feeling for how R works. Neext week we will dive a little deeper into data frames by importing one into R preparing it for analysis.
See you next week!