Machine Learning Flashcards

Question

Which library is commonly used for data analysis?

Answer 1

pandas ## Footnote pandas is a powerful data manipulation and analysis library for Python.

Answer 2

NumPy ## Footnote NumPy provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions.

Answer 3

Matplotlib, seaborn ## Footnote Matplotlib is a plotting library for the Python programming language and seaborn is built on top of Matplotlib for making statistical graphics.

Answer 4

Scikit-Learn ## Footnote Scikit-Learn is a machine learning library that provides simple and efficient tools for data mining and data analysis.

Answer 5

False ## Footnote Matplotlib is primarily used for data visualization, not machine learning.

Answer 6

df.shape # (rows, columns)

Answer 7

We're trying to predict our target/result/output variable using all of the other variables. To do this, we'll split the target variable from the rest. We can do this by creating: X - Our features (all variables except the target column) using pd.DataFrame.drop(labels="target"). y - Our target variable using df.target.to_numpy() (this will extract the target column as a NumPy array).

Answer 8

The process of finding the best 'knobs' on a model before it learns from data. ## Footnote Analogous to adjusting oven temperature and baking time in a cookie recipe to achieve the best outcome.

Answer 9

* Learning rate * Number of trees * Regularization strength

Answer 10

A measure of which inputs (features) the model relies on most heavily to make its decisions. ## Footnote Similar to ranking clues in a mystery by their importance.

Answer 11

Remove or shuffle one clue at a time and see how much the model's accuracy drops.

Answer 12

A table that shows how well a classifier got each decision right or wrong.

Answer 13

* TP (True Positive): Model said 'Yes' and it was Yes * FP (False Positive): Model said 'Yes' but it was No * FN (False Negative): Model said 'No' but it was Yes * TN (True Negative): Model said 'No' and it was No

Answer 14

A way to test a model’s reliability by training and testing it on different slices of the data.

Answer 15

* Split data into k groups (folds) * Train on k-1 folds, test on the remaining fold * Average the results

Answer 16

Of all the times the model said 'Yes,' how many were actually Yes?

Answer 17

$$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}$$

Answer 18

Of all the real Yes cases, how many did the model catch?

Answer 19

$$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}$$

Answer 20

A single number that balances Precision and Recall when you care about both.

Answer 21

$$F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$

Answer 22

A summary table of Precision, Recall, F1 (and support) for each class in your problem.

Answer 23

* Class * Precision * Recall * F1-Score * Support

Answer 24

A plot that shows the trade-off between catching positives (True Positive Rate) and raising false alarms (False Positive Rate).

Answer 25

* X-axis: False Positive Rate = FP / (FP + TN) * Y-axis: True Positive Rate = Recall = TP / (TP + FN)

Answer 26

Area Under the Curve; it summarizes how well your model separates classes across all thresholds.

Answer 27

0.5 (no better than random guessing) to 1.0 (perfect separation).

Answer 28

report card

Answer 29

It refers to the assumptions made by an algorithm about how data behaves.

Answer 30

They assume a straight-line (or plane, hyperplane) relationship.

Answer 31

They can capture non-linearities and interactions without hand-crafting features.

Answer 32

They risk overfitting by fitting any pattern, including noise.

Answer 33

No single algorithm is universally best across all possible problems.

Answer 34

They have few parameters and can’t learn highly complex patterns, leading to underfitting.

Answer 35

Speed of training, memory usage, and interpretability.

Answer 36

They may perform differently based on dimensionality, feature types, and noise.

Answer 37

Hyperparameters such as number of trees, depth of trees, and learning rates.

Answer 38

[structure]

Answer 39

They stack layers of weighted sums plus non-linear activations.

Answer 40

They often need GPUs, large RAM, and expert tuning.

Answer 41

They are robust to noise and outliers.

Answer 42

Wide data.

Answer 43

Each algorithm can be compared to different tools used for drawing shapes.

Answer 44

Raw performance versus speed, cost, or explainability.

Answer 45

They fundamentally change how the algorithm 'sees' the data.

Answer 46

Machine learning is about representing data as numbers.

Answer 47

Raw data is turned into vectors (arrays) of numbers.

Answer 48

Leave numeric fields as-is; encode categories via one-hot, ordinal codes or embeddings.

Answer 49

Words or sentences are mapped to embeddings (e.g. Word2Vec, BERT vectors).

Answer 50

Each pixel (or convolutional feature) is treated as a numeric value.

Answer 51

Audio is converted to spectrogram magnitudes or learned feature vectors.

Answer 52

Feeding those arrays into algorithms.

Answer 53

Learn weight vectors w so that w·x ≈ y.

Answer 54

They split numeric dimensions to partition data by label or value.

Answer 55

Layers of weighted sums and nonlinearities.

Answer 56

To adjust model parameters (weights) to minimize a loss function.

Answer 57

Mean-squared error, cross-entropy.

Answer 58

To check performance on held-out data and ensure patterns aren't just memorization.

Answer 59

Deciding which numbers to feed the model or how to transform raw data into useful arrays.

Answer 60

Better representations.

Answer 61

Data points.

Answer 62

The judge (algorithm) might misclassify trees.

Answer 63

Choosing model architecture, loss functions, regularization, hyperparameter tuning, deployment pipelines.

Answer 64

Learning that involves clustering and dimensionality reduction.

Answer 65

Decision-making in environments.

Answer 66

Representations (e.g. raw pixels → convolutional features).

Answer 67

Translating everything into numbers (arrays) and using algorithms to uncover patterns.

Answer 68

Which numbers you choose and how you train, validate, and deploy your pattern-finder.

Answer 69

Sorting emails into SPORTS vs COOKING

Answer 70

Counts word frequencies in each bucket and multiplies these probabilities for all words

Answer 71

Fast: Just counting words

Answer 72

Assumes words act alone: treats words separately

Answer 73

Spam filtering, sentiment analysis, classifying short texts

Answer 74

Predicting if a person will survive the Titanic based on age and ticket class

Answer 75

Factors pull toward 'Survived' or 'Did Not Survive' and are summed, then squished into a probability

Answer 76

Simple: You can see how age or class affects the odds

Answer 77

Needs a straight-line boundary: struggles with complex combinations

Answer 78

Simple yes/no predictions

Answer 79

Classifying fruits based on weight and sweetness

Answer 80

Looks at the k nearest neighbors to classify a new fruit

Answer 81

No training: Just look at the closest examples

Answer 82

Slow for lots of data: Must check all fruits

Answer 83

Simple similarity-based tasks

Answer 84

Separating cats vs dogs based on height and weight

Answer 85

Draws a line that maximizes the gap between two groups

Answer 86

Powerful with clear margins: finds the perfect boundary

Answer 87

Slow: Needs to check all points

Answer 88

Text categorization, image classification

Answer 89

Classifying animals using a 20 Questions game

Answer 90

Splits data step by step until it lands on a label

Answer 91

Easy to explain: can literally read the tree

Answer 92

Overfitting: might memorize exceptions instead of general rules

Answer 93

Clear, rule-based decisions

Answer 94

Predicting house price

Answer 95

Uses many small trees to vote on the price

Answer 96

Reduces overfitting: a forest generalizes

Answer 97

Slower: many trees take time

Answer 98

Predicting prices, detecting fraud

Answer 99

Predicting if a customer will cancel a subscription

Answer 100

Builds trees step-by-step, each correcting the mistakes of the previous one

Answer 101

Very accurate: learns complex patterns

Answer 102

Slow to train

Answer 103

Fraud detection, forecasting

Answer 104

Recognizing handwritten digits

Answer 105

Layers of pattern builders recognize full digits from simple strokes

Answer 106

Learns very complex patterns

Answer 107

Needs a lot of data and computing power

Answer 108

Images, speech recognition, language models

Answer 109

Simple, fast, good for straightforward patterns

Answer 110

Use similarity or clear margins

Answer 111

Great for structured, tabular data

Answer 112

Best for very complex patterns like images and sound

Machine Learning Flashcards

(144 cards)