Front
Back
What does dplyr::filter() do?
Row-wise filtering using logical predicates (SQL WHERE).
Code:
library(dplyr)
df <- tibble(name=c(‘A’,’B’,’C’), score=c(80,60,90))
filter(df, score >= 80)
Notes:
Avoid conflict with stats::filter(); explicitly call dplyr::filter() if needed.
What does dplyr::select() do?
Selects columns; supports helper functions.
Code:
library(dplyr)
select(iris, starts_with(‘Sepal’), Species)
Notes:
Use rename() with new = old; use everything(), starts_with(), ends_with(), contains().
How do you rename columns with dplyr?
Use rename(data, new = old).
Code:
library(dplyr)
rename(mtcars, miles_per_gallon = mpg)
What does dplyr::mutate() do?
Adds/changes columns; can refer to newly created columns.
Code:
library(dplyr)
mtcars %>% mutate(kpl = mpg * 0.425, efficient = kpl > 12)
Notes:
Use transmute() to return only new columns (+ grouping keys).
How do you sort rows with dplyr::arrange()?
Reorders rows by one or more columns.
Code:
library(dplyr)
arrange(mtcars, desc(mpg), cyl)
How do you summarise by groups with dplyr?
Use group_by() + summarise().
Code:
library(dplyr)
mtcars %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), .groups=’drop’)
Notes:
Remember .groups argument; n() counts rows; across() applies to multiple columns.
How do you apply a function across multiple columns? (across)
Use across() within mutate/summarise.
Code:
library(dplyr)
iris %>% summarise(across(starts_with(‘Sepal’), mean))
How do you bind or stack data frames in dplyr?
Use bind_rows() to stack; bind_cols() to concatenate columns.
Code:
library(dplyr)
bind_rows(tibble(x=1), tibble(x=2))
Explain left_join/right_join/inner_join/full_join in dplyr.
Relational joins matching keys between two tables.
Code:
library(dplyr)
left_join(x, y, by=’id’)
inner_join(x, y, by=’id’)
right_join(x, y, by=’id’)
full_join(x, y, by=’id’)
Notes:
semi_join() filters x to keys in y; anti_join() filters x to keys absent in y.
What does tidyr::pivot_longer() do?
Converts wide data to long (key-value pairs).
Code:
library(tidyr); library(dplyr)
w <- tibble(id=1, a=10, b=20)
pivot_longer(w, cols = a:b, names_to=’var’, values_to=’value’)
What does tidyr::pivot_wider() do?
Converts long data to wide.
Code:
library(tidyr)
l <- tibble(id=c(1,1), var=c(‘a’,’b’), value=c(10,20))
pivot_wider(l, names_from=var, values_from=value)
How do you split and combine columns? (separate/unite)
Use separate() to split one column; unite() to combine.
Code:
library(tidyr)
tibble(date=’2024-07’) %>% separate(date, into=c(‘year’,’month’), sep=’-‘)
How do you handle missing data with tidyr?
drop_na() removes rows with NA; replace_na() replaces NAs by column.
Code:
library(tidyr)
drop_na(airquality, Ozone)
tidyr::replace_na(tibble(x=c(1,NA)), list(x=0))
How do you detect duplicates and unique rows in dplyr?
distinct() returns unique rows; duplicated() (base) flags duplicates.
Code:
library(dplyr)
distinct(mtcars, cyl, gear)