09_Tidymodels_Workflow Flashcards

(7 cards)

1
Q

Front

A

Back

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does recipes::recipe() do and how is it used?

A

Defines a preprocessing blueprint separate from the model.

Code:
library(tidymodels)
rec <- recipe(Species ~ ., data = iris) %>%
step_normalize(all_numeric_predictors()) %>%
step_dummy(all_nominal_predictors())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you prep() and bake() a recipe?

A

prep() estimates parameters; bake() applies them to data.

Code:
prepped <- prep(rec)
train_processed <- bake(prepped, new_data = NULL) # training
test_processed <- bake(prepped, new_data = iris[1:10,])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you specify and fit a model with parsnip?

A

Define a model spec, set engine, then fit().

Code:
spec <- rand_forest(mode=’classification’, trees=500) %>% set_engine(‘ranger’)
fit <- fit(spec, Species ~ ., data = iris)
predict(fit, new_data = iris)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do workflows combine recipe and model?

A

Use workflow() + add_model() + add_recipe() then fit().

Code:
wf <- workflow() %>% add_model(spec) %>% add_recipe(rec)
wf_fit <- fit(wf, data = iris)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you do cross-validation with rsample and tune?

A

Use vfold_cv() for folds and tune_grid() to search hyperparameters.

Code:
set.seed(1)
folds <- vfold_cv(iris, v = 5)
grid <- grid_regular(trees(range = c(100, 500)), levels = 3)
tuned <- tune_grid(wf, resamples = folds, grid = grid)
collect_metrics(tuned)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you measure model performance with yardstick?

A

Use metrics like accuracy(), roc_auc(), rmse(), rsq().

Code:
library(yardstick)
# For classification:
metrics <- metric_set(accuracy)
metrics(iris, truth = Species, estimate = predict(fit, new_data=iris)$.pred_class)

Notes:
For regression use rmse()/rsq() with numeric truth/estimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly