variable
name assigned to a value, and stored in global environment
shortcut = alt + -
naming
legal names in R must begin with a letter
. _ and numbers all allowed (but not first)
case sensitive
functions
carry out a calculation
no side-effects –> don’t change the arguements forever
args() prints summary of main arguements of a function, and default arguement values
arguements
parts inside the parenthesis of a function
a function may have multiple arguements
separated by ,
arguement names don’t have to be specified if in the default order
name-value pairs
arguements given as name-value pairs
‘name=value’
function nesting
read from inside out
vectors
1-dimensional data structure for storing a set of values
element = no. values in vector
c() combines vectors
atomic vectors
contain data of 1 type
(e.g. all integers or all characters)
[1] indicates something is an atomic vector
(e.g. [1] 2)
numeric vectors
numeric
character vectors
character strings
have to put ‘ or “ around characters
logical vectors
elements take only 2 values: ‘TRUE’ or ‘FALSE’
relational operators
x < y
x > y
x <= y
x >= y
x == y
x != y
statistical variables
anything we can control or measure
data frames
table-like object with rows and columns
columns = statistical variables (each a vector of the same length (<chr>, <dbl>, or <int>))</int></dbl></chr>
rows = related observations
data.frame() makes dataframe
extracting variables
use double square brackets around variable (variable needs “)
or use $ (variable doesn’t need “)
packages
collection of folders and files combining code, data, and documents for sharing between computers
CRAN website contains all packages
Task View looks at what packages useful for your type of data analysis
packages must be installed (once) and loaded and attached (each session)
tidyverse
collection of R packages
data wrangling
cleaning and manipulating data ready for analysis
tidy data
1 variable = 1 column
each row has 1 unique observation
e.g. if biomass was measured at 2 time points, can’t have a T1 biomass column and T2 biomass column as this splits biomass across 2 columns
tibbles
tidyverse version of a data frame
dplyr
helps manipulate rectangular data
functions:
- glimpse
- select
- mutate
- filter
- arrange
- summarise
select
selects certain variables and (optionally) renames them
don’t need ‘’, but if variable has a space, use ``
select all but certain variables using ! before their name
rename using name-vaue (<new>=<variable>)</variable></new>
mutate
creates new variables from pre-existing ones, and keeps original variables
don’t use quotes
can rename at same time using name-value
can make multiple at same time, separated by ,
rename
renames variables if only want to rename, not select as well