key terms Flashcards

Question

transmute

Answer 1

creates new variables from pre-existing ones and drops all the original variables if want to keep some original variables unaltered, add them as arguements

Answer 2

filter variables using relational operators and logical operators (so only get the observations that fit that criteria) variables don't need '', but character string values do

Answer 3

link relational operators x & y x | y

Answer 4

arranges rows according to their observations use desc() fr descending order

Answer 5

summarises variables according to functions used tibble returned doesn't need to be same length as initial vectors can name these new variables at same time using name-value

Answer 6

ignores NA values

Answer 7

performs calculations on individual groups - e.g. mean bill length by species can group by multiple variables missing values create extra groups

Answer 8

pipe operator %>% shortcut = ctrl + shift + m %>% means take the product of the expression on the LHS and use it as the arguement where the . placeholder is can leave out . and R assumes it to be the first arguement

Answer 9

used with dplyr functions starts_with, contains, case_when, etc.

Answer 10

numeric - continuous or discrete (continuous variables can be made discrete by measuring apparatus) categorical - ordinal or nominal

Answer 11

ratio = meaningful 0 (represents the absence of a quantity) - can add/subtract and multipl/divide on the scale - often for physical quantities - e.g. one tree is twice as tall as another) interval - has a meaningful 0 (doesn't represent the absence of a quantity) - can add/subtract on scale, but not multiply/divide - e.g. date (would not say 1000AD is twice as long as 500AD)

Answer 12

sample = small group drawn from the wider population exploratory data analysis works with exploring properties of samples

Answer 13

central tendency (averages), dispersion (variance, interquartile range, and standard deviation), and associations (Pearson's correlation coefficient if linear, Spearman's rank correlatoin if nonlinear, etc)

Answer 14

shows which combinations of categories are common in categorical data returns a contingency table

Answer 15

graphics package - standard R package, flexible lattice package - good for multivariable relationships ggplot2 package - easy to make sophisticated plots

Answer 16

at least 1 layer (associated with data and rules on how to display it) a scale for each aesthetic mapping a coordinate system per plot a facet specification if using a multi-panel plot (add graphical objects using +)

Answer 17

5 components: - data (in form of data frame / tibble) - aesthetic mappings (describe how data is associated with aesthetics such as position, colour, size of points) - geometric object - aka geom (how to present info - i.e. grahp type) - statistical transformation - aka stat (transforms the raw data (if not summaried first in dplyr)) - position adjustment - tweak position of layer elements (how info for categories is separated - e.g. if bars are stacked or side-by-side)

Answer 18

how variable info is mapped to aesthetic properties (every aesthetic mapping used must have a scale so that data can be mapped onto it) if multiple plots, must all have same scale for shared aesthetic mappings

Answer 19

takes position of objects (points, lines, etc) and maps them onto the plot

Answer 20

breaks dataset into subsets, with a different plot for each each plot has same layers and scales, etc

Answer 21

geom_TYPE e.g. scattergraphs are geom_point

Answer 22

stat = identity position = identity when we want to plot data wthout modification

Answer 23

ggplot(data, aes()) + geom_TYPE add comments using # to narrate what you're doing at each step

Answer 24

e.g. aes(x=bill_length_mm, y=bill_depth_mm, colour=species)

Answer 25

operates on the whole figure - e.g. can split data up by species or by island or both facet_wrap() - wraps a 1D sequence of panels into a 2D matrix with rows and columns (no empty panels), used for single grouping variable facet_grid() - 2D matrix of panels in rows and columns (empty panels if combo doesn't exist), 2+ categorical variables need ~ before the variable you want to facet

Answer 26

e.g. make summary layers of means over the raw data point layers

Answer 27

specify arguements in geom_TYPE function e.g. shape, size, transparency (alpha), etc colour() prints available colour choices

Answer 28

breaks arguement of scale_AES_TYPE adjusts the intervals/guides

Answer 29

feature of whole plot, not a layer so have to use label function (labs()), not an arguement of geom

Answer 30

adjusts all visual elements not adjusted by geom or scales (ie the non-data parts such as background colour, grid lines, label positions, fonts) use theme() function so many adjustments possible, so people usually google specific adjustments as and when need some standard themes, e.g. theme_bw() (put any additional theme changes after setting the standard theme type)

Answer 31

start with basic skeleton of a plot and build it upadding more customisation

Answer 32

useful for viewing distributions of large samples (>100) need to bin data first can bin in dplyr, or get ggplot2 to do it for us (using stat facility stat_bin()) (pick appropriate bin width using binwidth='') increasing binwidth smooths the histogram only need to define x axis of histogram as ggplot2 does y axis for us fill and colour to customise

Answer 33

good for visualising distributions of small samples bins not evenly spaced no. stacked dots represents the height along the y axis

Answer 34

good for counting frequency of occurences within each category use count() geom_bar - counts number in each category, don't supply y aesthetic, use on raw data geom_col - uses y values (not counting), so useful when data already summarised and want to plot exact values can adjust labels, widths of bars, colours can reorder the bars: make a character vector of species names in order, then use the limits = arguement when adjusting the scale coord_flip flips the axes if want to treat numerical values as categorical, convert to character vector using as.character()

Answer 35

usually shown by scatter plot problem: don't reveal over-plotting (when points very close due to large dataset or many identical points due to coarse scale) solutions: use small points, reduce opacity, or change geom type

Answer 36

like histograms but in 2 dimensions (divides plane into rectangles or hexagons, darker fill colour = more cases at that point)

Answer 37

gives scatter graph where point size scaled according to no. cases

Answer 38

bar charts usually used separate bar for each combo of categories in the 2 variables have to define 2 aesthetic mappings so produce a stacked bar chart to ensure all variables are treated as categorical, convert to factors (factor()) (used by R to represent categorical variables) use levels arguement to set order to make side-by-side, use dodge arguement

Answer 39

usually use box plot looks at distribution of numerical variable within categories highlights outliers alternative = make multiple histograms

Answer 40

overlay: set position arguement to identity and increase transparency --> shows overlaps faceting: use facet_wrap so each on different plot

Answer 41

either use faceting to make a multi-panel plot (na.omit function gets rid of NA) or add extra aesthetic mapping

Answer 42

to plot means: bar chart shoing just means -> calc means using summarise, plot using gaom_col more useful to plot error bars too

Answer 43

shows uncertainty first calc means and standard error (sd()) use geom_col and add geom_errorbar needs ymin and ymax aesthetic mappings - e.g. +/- 1 standard error position = position_dodge(0.9) makes error bars at centre of each bar

Answer 44

use geom_pointrange adds means and errors as single layer

Answer 45

use geom_text(label = ) put labels in data frame (separate or same as one used for plot)

Answer 46

ggsave function first arguement = path and name of file set file type by device = arguement

Answer 47

use external package - cowplot make individual plots first and assign names use plot_grid() to print in order control number of rows (nrow) and columns (ncol) rel_widths and rel_heights controls plot widths and heights

Answer 48

help files for each package give examples of uses of functions help.start() shows help files for each installed package: - manuals - reference -packages - search engine & key words - miscellaneous material - user manuals - frequently asked questions

Answer 49

help(topic=) only searches in loaded packages or just put ? before the function name

Answer 50

description (function overview) usage arguements details (how it behaves) value (what object is returned) references see also (related functions) examples

Answer 51

give account of a package's features use vignettes() function to view all available vignettes vignettes(package = ) or vignettes(topic = , package=) to be more specific browseVignettes() opens them in browser

key terms Flashcards

(77 cards)