Machine Learning Foundations Flashcards

Grasp the core workflow and metrics behind machine learning model development. (25 cards)

1
Q

Define:

feature engineering

A

The process of selecting, modifying, or creating input variables (features) that help improve the performance of a machine learning model.

Good feature engineering can significantly enhance the accuracy of a model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or False:

Feature engineering is not necessary for machine learning models.

A

False

Effective feature engineering can greatly affect a model’s success by providing better inputs for learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fill in the blank:

The dataset is split into ______ and testing sets to evaluate a model’s performance.

A

training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the purpose of the training dataset?

A

To teach a machine learning model by providing examples that the model can learn from.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of the testing dataset?

A

To evaluate the performance of a trained machine learning model on new, unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is model evaluation in machine learning?

A

The process of assessing how well a machine learning model performs on a given dataset using various metrics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True or False:

A model that performs well on training data will always perform well on new data.

A

False

This is not always true due to overfitting, where the model learns the training data too well and fails to generalize to new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does overfitting mean?

A

A situation where a model learns the training data too closely, capturing noise as if it were a true pattern, which negatively impacts its performance on new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does underfitting mean?

A

A situation where a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and testing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Fill in the blank:

A common metric for evaluating classification models is ______.

A

accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is accuracy in the context of model evaluation?

A

The proportion of correct predictions made by a machine learning model out of all predictions made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a limitation of using accuracy as an evaluation metric?

A

Accuracy can be misleading in imbalanced datasets where one class is much more common than the others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is precision in model evaluation?

A

The ratio of true positive predictions to the total number of positive predictions made, measuring the accuracy of positive predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is recall in model evaluation?

A

The ratio of true positive predictions to the total number of actual positive cases, measuring the model’s ability to find all the relevant cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do the F1 score represent?

A

A metric that combines precision and recall into a single score by taking their harmonic mean, used to balance both metrics when they are important.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which metric would you prioritize for a medical diagnosis model: precision or recall?

17
Q

Why would you prioritize recall for a medical diagnosis model?

A

In medical diagnosis, it’s crucial to identify as many positive cases as possible to ensure patients receive necessary treatment, even if some false positives occur.

18
Q

What is cross-validation in the context of model evaluation?

A

A technique for assessing how a model will generalize to an independent dataset by training and testing the model multiple times with different data splits.

19
Q

Fill in the blanks:

______ ______ is the process of selecting the best features to use in a machine learning model.

A

Feature selection

20
Q

Why is feature selection important?

A

It helps reduce the complexity of a model, improves performance, and reduces the risk of overfitting by eliminating irrelevant or redundant features.

21
Q

How does feature scaling affect machine learning models?

A

Feature scaling standardizes the range of independent variables, improving the convergence speed of algorithms and ensuring features contribute equally to the results.

22
Q

What is the difference between normalization and standardization in feature scaling?

A
  • Normalization scales the data to a range [0, 1].
  • Standardization scales data to have a mean of 0 and a standard deviation of 1.
23
Q

What is a confusion matrix?

A

A table used to evaluate the performance of a classification model, showing the true versus predicted classifications.

24
Q

In a scenario where you need to recommend movies to users, which machine learning subfield would be most applicable?

A

Collaborative filtering, a technique within machine learning used in recommendation systems.

25
Why is it important to have **separate training** and **testing datasets**?
To ensure that the performance evaluation of the model is **unbiased** and reflects its ability to **generalize to unseen data**.