Machine Learning Flashcards

(144 cards)

1
Q

What is exploratory data analysis (EDA)?

A

The process of going through a dataset and discovering more about it.

EDA helps in understanding the data’s structure, patterns, and anomalies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is model training?

A

Create model(s) to learn to predict a target variable based on other variables.

This involves using training data to adjust the model parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does model evaluation entail?

A

Evaluating a model’s predictions using problem-specific evaluation metrics.

Common metrics include accuracy, precision, recall, and F1 score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the purpose of model comparison?

A

Comparing several different models to find the best one.

This helps in identifying which model performs best under given conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is model hyperparameter tuning?

A

Tweaking a model’s hyperparameters to improve it after finding a good model.

Hyperparameters are settings that are not learned from the data but are set prior to training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does feature importance refer to?

A

Identifying features/characteristics that are more important for predicting heart disease.

Feature importance helps in understanding which variables have the most influence on the prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is cross-validation?

A

A method to ensure a good model works on unseen data.

It involves partitioning the dataset into subsets to evaluate the model’s performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What should be included in reporting what we’ve found?

A

Presenting the work and findings in a clear manner.

This may include visualizations, key metrics, and insights derived from the analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are attributes in the context of predictive modeling?

A

Attributes are the variables used to predict the target variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are attributes also referred to as?

A

Independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the target variable in predictive modeling?

A

The dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fill in the blank: Attributes are also called _______.

A

features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or False: The target variable can also be referred to as an independent variable.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an evaluation metric?

A

An evaluation metric is something you usually define at the start of a project.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why can evaluation metrics change over time?

A

Because machine learning is very experimental.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What example goal might a project start with?

A

Reach 95% accuracy at predicting whether or not a patient has heart disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the purpose of setting a goal for machine learning engineers?

A

It provides a rough goal to work towards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What may happen to the project goal as it progresses?

A

It may have to be adjusted based on real-world testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are features in the context of data?

A

Different parts and characteristics of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What should you do during the step of identifying important features?

A

Start exploring what each portion of the data relates to and create a reference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a common way to document features of data?

A

Create a data dictionary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a data dictionary?

A

A data dictionary describes the data you’re dealing with.

It provides metadata about the data elements in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Do all datasets come with a data dictionary?

A

No, not all datasets come with data dictionaries.

This may require additional research or consultation with a subject matter expert.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What should you do if a dataset does not have a data dictionary?

A

You may have to do your research or ask a subject matter expert.

A subject matter expert is someone who knows about the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Which library is commonly used for data analysis?
pandas ## Footnote pandas is a powerful data manipulation and analysis library for Python.
26
What library is typically employed for numerical operations?
NumPy ## Footnote NumPy provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions.
27
What are the libraries used for data visualization?
Matplotlib, seaborn ## Footnote Matplotlib is a plotting library for the Python programming language and seaborn is built on top of Matplotlib for making statistical graphics.
28
Which library is utilized for machine learning modelling and evaluation?
Scikit-Learn ## Footnote Scikit-Learn is a machine learning library that provides simple and efficient tools for data mining and data analysis.
29
Fill in the blank: For data analysis, you will likely use _______.
pandas
30
Fill in the blank: For numerical operations, the library of choice is _______.
NumPy
31
True or False: Matplotlib is used for machine learning modelling.
False ## Footnote Matplotlib is primarily used for data visualization, not machine learning.
32
True or False: Scikit-Learn is a library for machine learning.
True
33
df.shape
df.shape # (rows, columns)
34
Once you've analysed the data (structured data), how do you start with the training of the model?
We're trying to predict our target/result/output variable using all of the other variables. To do this, we'll split the target variable from the rest. We can do this by creating: X - Our features (all variables except the target column) using pd.DataFrame.drop(labels="target"). y - Our target variable using df.target.to_numpy() (this will extract the target column as a NumPy array).
35
Choosing a model
36
What is hyperparameter tuning?
The process of finding the best 'knobs' on a model before it learns from data. ## Footnote Analogous to adjusting oven temperature and baking time in a cookie recipe to achieve the best outcome.
37
What are some examples of hyperparameters to tune?
* Learning rate * Number of trees * Regularization strength
38
What is feature importance?
A measure of which inputs (features) the model relies on most heavily to make its decisions. ## Footnote Similar to ranking clues in a mystery by their importance.
39
How do you determine feature importance?
Remove or shuffle one clue at a time and see how much the model's accuracy drops.
40
What is a confusion matrix?
A table that shows how well a classifier got each decision right or wrong.
41
What do the terms TP, FP, FN, and TN represent in a confusion matrix?
* TP (True Positive): Model said 'Yes' and it was Yes * FP (False Positive): Model said 'Yes' but it was No * FN (False Negative): Model said 'No' but it was Yes * TN (True Negative): Model said 'No' and it was No
42
What is cross-validation?
A way to test a model’s reliability by training and testing it on different slices of the data.
43
What is the process for cross-validation?
* Split data into k groups (folds) * Train on k-1 folds, test on the remaining fold * Average the results
44
What is precision?
Of all the times the model said 'Yes,' how many were actually Yes?
45
How is precision calculated?
$$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$$
46
What is recall?
Of all the real Yes cases, how many did the model catch?
47
How is recall calculated?
$$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}$$
48
What is the F1 score?
A single number that balances Precision and Recall when you care about both.
49
How is the F1 score calculated?
$$F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$
50
What is a classification report?
A summary table of Precision, Recall, F1 (and support) for each class in your problem.
51
What information is included in a classification report?
* Class * Precision * Recall * F1-Score * Support
52
What is an ROC curve?
A plot that shows the trade-off between catching positives (True Positive Rate) and raising false alarms (False Positive Rate).
53
What do the X-axis and Y-axis represent in an ROC curve?
* X-axis: False Positive Rate = FP / (FP + TN) * Y-axis: True Positive Rate = Recall = TP / (TP + FN)
54
What does AUC stand for and what does it represent?
Area Under the Curve; it summarizes how well your model separates classes across all thresholds.
55
What is the range of AUC values?
0.5 (no better than random guessing) to 1.0 (perfect separation).
56
True or False: The F1 score penalizes imbalances heavily.
True
57
Fill in the blank: A classification report is like a _______ listing grades for each subject.
report card
58
What is meant by the term 'inductive bias' in machine learning?
It refers to the assumptions made by an algorithm about how data behaves.
59
How do linear models interpret the relationship between inputs and outputs?
They assume a straight-line (or plane, hyperplane) relationship.
60
What are the strengths of decision trees and random forests?
They can capture non-linearities and interactions without hand-crafting features.
61
What is a major risk when using high-capacity models?
They risk overfitting by fitting any pattern, including noise.
62
What does the No-Free-Lunch Theorem state in machine learning?
No single algorithm is universally best across all possible problems.
63
What are the characteristics of low-capacity models?
They have few parameters and can’t learn highly complex patterns, leading to underfitting.
64
What are some practical constraints to consider when choosing algorithms?
Speed of training, memory usage, and interpretability.
65
How do algorithms differ in handling data characteristics?
They may perform differently based on dimensionality, feature types, and noise.
66
What must be tuned within a family of algorithms like random forests?
Hyperparameters such as number of trees, depth of trees, and learning rates.
67
True or False: A linear model is suitable for capturing complex feature interactions.
False.
68
Fill in the blank: The performance of an algorithm can be affected by the _______ of the problem.
[structure]
69
What is a key feature of neural networks?
They stack layers of weighted sums plus non-linear activations.
70
Why might deep neural networks require more resources?
They often need GPUs, large RAM, and expert tuning.
71
What do tree ensembles offer in terms of handling data?
They are robust to noise and outliers.
72
What type of data do some models handle better when there are thousands of features?
Wide data.
73
How can you visualize the difference between algorithms in machine learning?
Each algorithm can be compared to different tools used for drawing shapes.
74
What is the main trade-off when selecting algorithms?
Raw performance versus speed, cost, or explainability.
75
What is the role of hyperparameters in machine learning algorithms?
They fundamentally change how the algorithm 'sees' the data.
76
What is the core concept of machine learning?
Machine learning is about representing data as numbers.
77
How is raw data represented in machine learning?
Raw data is turned into vectors (arrays) of numbers.
78
How should tabular data be prepared for machine learning?
Leave numeric fields as-is; encode categories via one-hot, ordinal codes or embeddings.
79
How are words or sentences represented in machine learning?
Words or sentences are mapped to embeddings (e.g. Word2Vec, BERT vectors).
80
How are images treated in machine learning?
Each pixel (or convolutional feature) is treated as a numeric value.
81
What is the numeric representation of audio data in machine learning?
Audio is converted to spectrogram magnitudes or learned feature vectors.
82
What is the next step after representing data as numeric matrices?
Feeding those arrays into algorithms.
83
What do linear models in machine learning do?
Learn weight vectors w so that w·x ≈ y.
84
How do tree-based models operate?
They split numeric dimensions to partition data by label or value.
85
What do neural networks apply to extract patterns?
Layers of weighted sums and nonlinearities.
86
What is the purpose of optimization in machine learning?
To adjust model parameters (weights) to minimize a loss function.
87
What are common loss functions used in machine learning?
Mean-squared error, cross-entropy.
88
What is the role of cross-validation in machine learning?
To check performance on held-out data and ensure patterns aren't just memorization.
89
What is feature engineering?
Deciding which numbers to feed the model or how to transform raw data into useful arrays.
90
What can yield bigger performance gains than tweaking the algorithm itself?
Better representations.
91
In the orchard analogy, what do the trees represent?
Data points.
92
In the orchard analogy, what do the branches represent?
Features.
93
What might happen if a crucial feature is not measured in the orchard analogy?
The judge (algorithm) might misclassify trees.
94
What additional components are involved in machine learning beyond just numbers?
Choosing model architecture, loss functions, regularization, hyperparameter tuning, deployment pipelines.
95
What is unsupervised learning in machine learning?
Learning that involves clustering and dimensionality reduction.
96
What is reinforcement learning in machine learning?
Decision-making in environments.
97
What can deep models learn directly without handcrafting every feature?
Representations (e.g. raw pixels → convolutional features).
98
What is the 'secret sauce' of machine learning?
Translating everything into numbers (arrays) and using algorithms to uncover patterns.
99
What factors are crucial for success in machine learning?
Which numbers you choose and how you train, validate, and deploy your pattern-finder.
100
101
What is the example scenario for Naive Bayes?
Sorting emails into SPORTS vs COOKING
102
How does Naive Bayes work?
Counts word frequencies in each bucket and multiplies these probabilities for all words
103
What is a strength of Naive Bayes?
Fast: Just counting words
104
What is a weakness of Naive Bayes?
Assumes words act alone: treats words separately
105
When should Naive Bayes be used?
Spam filtering, sentiment analysis, classifying short texts
106
What is the example scenario for Logistic Regression?
Predicting if a person will survive the Titanic based on age and ticket class
107
How does Logistic Regression work?
Factors pull toward 'Survived' or 'Did Not Survive' and are summed, then squished into a probability
108
What is a strength of Logistic Regression?
Simple: You can see how age or class affects the odds
109
What is a weakness of Logistic Regression?
Needs a straight-line boundary: struggles with complex combinations
110
When should Logistic Regression be used?
Simple yes/no predictions
111
What is the example scenario for k-Nearest Neighbors (k-NN)?
Classifying fruits based on weight and sweetness
112
How does k-NN work?
Looks at the k nearest neighbors to classify a new fruit
113
What is a strength of k-NN?
No training: Just look at the closest examples
114
What is a weakness of k-NN?
Slow for lots of data: Must check all fruits
115
When should k-NN be used?
Simple similarity-based tasks
116
What is the example scenario for Support Vector Machines (SVM)?
Separating cats vs dogs based on height and weight
117
How does SVM work?
Draws a line that maximizes the gap between two groups
118
What is a strength of SVM?
Powerful with clear margins: finds the perfect boundary
119
What is a weakness of SVM?
Slow: Needs to check all points
120
When should SVM be used?
Text categorization, image classification
121
What is the example scenario for Decision Tree?
Classifying animals using a 20 Questions game
122
How does a Decision Tree work?
Splits data step by step until it lands on a label
123
What is a strength of Decision Trees?
Easy to explain: can literally read the tree
124
What is a weakness of Decision Trees?
Overfitting: might memorize exceptions instead of general rules
125
When should Decision Trees be used?
Clear, rule-based decisions
126
What is the example scenario for Random Forest?
Predicting house price
127
How does Random Forest work?
Uses many small trees to vote on the price
128
What is a strength of Random Forest?
Reduces overfitting: a forest generalizes
129
What is a weakness of Random Forest?
Slower: many trees take time
130
When should Random Forest be used?
Predicting prices, detecting fraud
131
What is the example scenario for Gradient Boosted Trees?
Predicting if a customer will cancel a subscription
132
How does Gradient Boosted Trees work?
Builds trees step-by-step, each correcting the mistakes of the previous one
133
What is a strength of Gradient Boosted Trees?
Very accurate: learns complex patterns
134
What is a weakness of Gradient Boosted Trees?
Slow to train
135
When should Gradient Boosted Trees be used?
Fraud detection, forecasting
136
What is the example scenario for Neural Networks?
Recognizing handwritten digits
137
How do Neural Networks work?
Layers of pattern builders recognize full digits from simple strokes
138
What is a strength of Neural Networks?
Learns very complex patterns
139
What is a weakness of Neural Networks?
Needs a lot of data and computing power
140
When should Neural Networks be used?
Images, speech recognition, language models
141
What is the core takeaway regarding Naive Bayes and Logistic Regression?
Simple, fast, good for straightforward patterns
142
What is the core takeaway regarding k-NN and SVM?
Use similarity or clear margins
143
What is the core takeaway regarding Trees, Random Forest, and Boosting?
Great for structured, tabular data
144
What is the core takeaway regarding Neural Nets?
Best for very complex patterns like images and sound