02_Tidyverse_Data_Wrangling Flashcards

Question 1

Q

Front

Question 2

Q

What does dplyr::filter() do?

Answer

A

Row-wise filtering using logical predicates (SQL WHERE).

Code:
library(dplyr)
df <- tibble(name=c(‘A’,’B’,’C’), score=c(80,60,90))
filter(df, score >= 80)

Notes:
Avoid conflict with stats::filter(); explicitly call dplyr::filter() if needed.

Question 3

Q

What does dplyr::select() do?

Answer

A

Selects columns; supports helper functions.

Code:
library(dplyr)
select(iris, starts_with(‘Sepal’), Species)

Notes:
Use rename() with new = old; use everything(), starts_with(), ends_with(), contains().

Question 4

Q

How do you rename columns with dplyr?

Answer

A

Use rename(data, new = old).

Code:
library(dplyr)
rename(mtcars, miles_per_gallon = mpg)

Question 5

Q

What does dplyr::mutate() do?

Answer

A

Adds/changes columns; can refer to newly created columns.

Code:
library(dplyr)
mtcars %>% mutate(kpl = mpg * 0.425, efficient = kpl > 12)

Notes:
Use transmute() to return only new columns (+ grouping keys).

Question 6

Q

How do you sort rows with dplyr::arrange()?

Answer

A

Reorders rows by one or more columns.

Code:
library(dplyr)
arrange(mtcars, desc(mpg), cyl)

Question 7

Q

How do you summarise by groups with dplyr?

Answer

A

Use group_by() + summarise().

Code:
library(dplyr)
mtcars %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), .groups=’drop’)

Notes:
Remember .groups argument; n() counts rows; across() applies to multiple columns.

Question 8

Q

How do you apply a function across multiple columns? (across)

Answer

A

Use across() within mutate/summarise.

Code:
library(dplyr)
iris %>% summarise(across(starts_with(‘Sepal’), mean))

Question 9

Q

How do you bind or stack data frames in dplyr?

Answer

A

Use bind_rows() to stack; bind_cols() to concatenate columns.

Code:
library(dplyr)
bind_rows(tibble(x=1), tibble(x=2))

Question 10

Q

Explain left_join/right_join/inner_join/full_join in dplyr.

Answer

A

Relational joins matching keys between two tables.

Code:
library(dplyr)
left_join(x, y, by=’id’)
inner_join(x, y, by=’id’)
right_join(x, y, by=’id’)
full_join(x, y, by=’id’)

Notes:
semi_join() filters x to keys in y; anti_join() filters x to keys absent in y.

Question 11

Q

What does tidyr::pivot_longer() do?

Answer

A

Converts wide data to long (key-value pairs).

Code:
library(tidyr); library(dplyr)
w <- tibble(id=1, a=10, b=20)
pivot_longer(w, cols = a:b, names_to=’var’, values_to=’value’)

Question 12

Q

What does tidyr::pivot_wider() do?

Answer

A

Converts long data to wide.

Code:
library(tidyr)
l <- tibble(id=c(1,1), var=c(‘a’,’b’), value=c(10,20))
pivot_wider(l, names_from=var, values_from=value)

Question 13

Q

How do you split and combine columns? (separate/unite)

Answer

A

Use separate() to split one column; unite() to combine.

Code:
library(tidyr)
tibble(date=’2024-07’) %>% separate(date, into=c(‘year’,’month’), sep=’-‘)

Question 14

Q

How do you handle missing data with tidyr?

Answer

A

drop_na() removes rows with NA; replace_na() replaces NAs by column.

Code:
library(tidyr)
drop_na(airquality, Ozone)
tidyr::replace_na(tibble(x=c(1,NA)), list(x=0))

Question 15

Q

How do you detect duplicates and unique rows in dplyr?

Answer

A

distinct() returns unique rows; duplicated() (base) flags duplicates.

Code:
library(dplyr)
distinct(mtcars, cyl, gear)

02_Tidyverse_Data_Wrangling Flashcards

(15 cards)