Pytorch Flashcards

(81 cards)

1
Q

What is the Pytorch “Module” submodule, and how is it used?

A
  • It’s the base class for all NN modules
  • automatic registration of all trainable parameters (PyTorch recognizes the components of my model)

Setup:
1. Creates a class that inherits from torch.nn.Module
2. Define the __init__() method (for layers/submodules & setting up the architecture)
3. Define the forward() method (to specify the flow through the architecture)
4. Create an instance of your class and apply it to input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Examples for gradient-based Optimizers (in Pytorch)

A
  • Stochastic Gradient Descent (good baseline)
  • Adam (adaptive Learning rate)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Loss function for regression tasks

A

MSE (Mean Squared Error)

typically, no output activation function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Loss function for classification tasks

A

Cross Entropy

sigmoid or log-softmax activation function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Regularization and what are common examples?

A

Methods to counter overfitting

  • Dropout: random features or inputs
  • Weight penalty terms: add additional terms to training loss (L1, L2)
  • Noise: add random noise to inputs or features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Confusion Matrix?

A

A table summarizing classification results: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define False Positive (FP).

A

Samples wrongly classified as positive (false alarm, Type I error).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define False Negative (FN).

A

Samples wrongly classified as negative (miss, Type II error).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Formula for True Positive Rate (TPR) = Recall = Sensitivity.

A

“How many true cases did I find?”

TPR = TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Formula for True Negative Rate (TNR) = Specificity

A

“How well do I reject negatives?”

TNR = TN / (TN + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Formula for False Negative Rate (FNR).

A

FNR = FN / (FN + TP) = 1 - TPR.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Formula for False Positive Rate (FPR).

A

FPR = FP / (FP + TN) = 1 - TNR.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Formula for Accuracy (ACC).

A

overall correctness (misleading if imbalanced)
ACC = (TP + TN) / (TP + TN + FP + FN).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Formula for Balanced Accuracy (BACC).

A

BACC = (TPR + TNR) / 2;
average of TPR & TNR (good for imbalance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is ROC and AUC?

A

ROC curve shows TPR vs FPR across thresholds; AUC is the area under ROC (0.5 random, 1 perfect).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Formula for Positive Predictive Value (Precision).

A

“How precise are my alarms?”
PPV = TP / (TP + FP).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Formula for F1-score.

A

harmonic mean of Precision & Recall (balance them)

F1 = 2 * (Precision * Recall) / (Precision + Recall) = 2TP / (2TP + FP + FN).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Formula for Fβ-score.

A

Fβ = (1 + β²) * (Precision * Recall) / (β² * Precision + Recall).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Formula for Mean Absolute Error (MAE).

A

MAE = (Σ|y - ŷ|) / n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Formula for Mean Squared Error (MSE).

A

MSE = (Σ(y - ŷ)²) / n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Formula for Root Mean Squared Error (RMSE).

A

RMSE = sqrt(MSE).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does MSE emphasize?

A

It exaggerates the presence of outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a Hold-Out Test Set?

A

Split dataset into train, validation, and test sets; test set provides independent performance estimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Cross Validation (CV)?

A

Split data into n folds; train n times leaving one fold out each time; average validation risk to estimate generalization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is Transfer Learning?
Reusing knowledge from a pre-trained model on a large source task for a smaller target task.
26
Two main approaches to Transfer Learning.
Option 1: Feature extractor (freeze backbone, train new head). Option 2: Fine-tuning (train more layers or entire model).
27
When to prefer fine-tuning over feature extraction?
If domain differs significantly, enough data is available, or Option 1 fails to achieve desired accuracy.
28
Why use Transfer Learning?
Saves compute power, data, and time; pre-trained models (e.g., HuggingFace) are readily available.
29
What does the sigmoid function do?
"logistic function" Maps the output between 0 and 1. Activation after sigmoid can be interpreted as the probability of belonging to the positive class.
30
Why use PyTorch for machine learning (e.g. instead of NumPy)?
- Provides GPU support (NumPy is CPU only) - dynamic computation graphs with autograd - a Pythonic API like NumPy - and a large ecosystem (TorchVision, TorchAudio)
31
What is a Tensor in PyTorch?
The core data structure, similar to NumPy arrays but with GPU and autograd support.
32
Difference between RAM and VRAM in PyTorch?
VRAM is GPU memory; PyTorch allows moving data to VRAM for parallel computation.
33
What is the goal of training a neural network?
To choose weights so the model output ŷ approximates the true target y by minimizing a loss function L(ŷ, y).
34
Why are iterative methods used in training?
Because we usually cannot compute the loss minimum directly.
35
How does gradient-based optimization work?
Compute loss --> Compute gradients w.r.t. weights --> Update weights by subtracting learning rate times gradient (θ ← θ - η∇L).
36
What is torch.utils.data.Dataset used for?
To create custom datasets with a standardized interface. Must implement __len__() and __getitem__(). --> to access and optionally transform data samples (e.g., applying preprocessing or augmentation)
37
What does __getitem__() in a Dataset return?
- A single sample (e.g., image, label, ID), with optional preprocessing and augmentation. - Shape and type of a single sample can be derived from reading the __getitem__() method
38
What is torch.utils.data.DataLoader?
A DataLoader wraps a Dataset and makes it possible to: - Iterate in minibatches (not one item at a time). - Shuffle data each epoch. - Use multiple workers (processes) to load data in parallel. - Apply collate logic to combine samples into a batch tensor.
39
How to customize batching in DataLoader?
Use the collate_fn argument: It combines the samples extracted by the getitem method into a single minibatch.
40
Typical steps to use DataLoader?
1. Derive a Dataset class with **__getitem__** and **__len__**. 2. Create dataset *instance*. 3. Create **Subset** splits. 4. Create **DataLoader** with *batchsize, shuffle, numworkers*. 5. Loop over DataLoader to get *minibatches*.
41
Best practice regarding shuffle for validation/test sets in DataLoader?
Disable shuffling because evaluation does not depend on sample order.
42
How to ensure reproducibility with DataLoader?
1. Store dataset split indices 2. Use torch.utils.data.Subset (= FIX the indices) 3. Set random seeds.
43
Do you need to convert samples to tensors inside Dataset?
No, DataLoader's default collate function can handle NumPy arrays and convert them to tensors.
44
Why include sample ID in Dataset return value?
For debugging and traceability.
45
What is torch.nn.Module?
Base class for all neural networks and layers; registers parameters and submodules automatically.
46
Which methods must a custom nn.Module implement?
__init__() to define layers and forward() to specify how input is transformed to output.
47
Why call the module instance directly instead of forward()?
It runs forward() and triggers any registered hooks automatically.
48
How to move modules or parameters to GPU?
Use to(device=...) to move them to a GPU device.
49
Where to find many predefined modules?
torch.nn library (https://pytorch.org/docs/stable/nn.html).
50
Describe a Fully-Connected Feed-Forward Neural Network (FFNN).
Every neuron is connected to all neurons in the next layer; high complexity; typically requires flattening multidimensional input.
51
Describe a Convolutional Neural Network (CNN).
Applies kernels (filters) across input dimensions to capture spatial or temporal patterns, often followed by pooling and flattening.
52
Difference between 1D, 2D, and nD CNNs?
1D for sequences (e.g., time series), 2D for images, nD possible but rare.
53
What is a Recurrent Neural Network (RNN)?
Processes sequential inputs, reusing the same weights and hidden state; variants include LSTM and Transformers.
54
Key properties of ReLU.
Computationally cheap and avoids vanishing gradients; most common activation.
55
Key properties of SELU.
Self-normalizing, improving stability of training.
56
Key properties of Sigmoid.
Outputs in [0,1]; older, prone to vanishing gradients.
57
What is autograd?
PyTorch's automatic differentiation engine that builds computation graphs and computes gradients using backward().
58
Which method computes gradients of a loss tensor?
loss.backward().
59
Which optimizers are common in PyTorch?
torch.optim.SGD (with optional momentum) and torch.optim.Adam (adaptive learning rate and momentum).
60
Steps in a weight update cycle?
1. optimizer.zero_grad() 2. loss.backward() 3. optimizer.step()
61
Common loss functions for regression.
Mean Squared Error (torch.nn.MSELoss), typically with no output activation.
62
Common loss functions for classification.
Cross-Entropy (torch.nn.CrossEntropyLoss) or BCEWithLogitsLoss for binary tasks (includes sigmoid internally).
63
Why is BCEWithLogitsLoss used instead of sigmoid + BCELoss?
It combines sigmoid and binary cross-entropy for numerical stability.
64
Difference between update and epoch.
Update = one weight update; Epoch = one full pass through the training data.
65
What is early stopping?
Stop training when validation loss does not improve for m updates/epochs and use the model with best validation loss.
66
What are good classification metrics for imbalanced classes?
Balanced Accuracy (BAC) AUC (Ranking ability across thresholds) F1 (Balance precision & recall) Fβ (Tune recall vs precision)
67
Purpose of monitoring tools like TensorBoard.
Visualize loss, weights, gradients; monitor training and validation performance.
68
First step if the model is not learning.
Check if gradients and weights are changing; consider gradient clipping.
69
Recommended workflow before regularizing.
First find a model that can overfit, then make it smaller or add regularization.
70
What does the Softmax function do?
Converts logits to class probabilities in multi-class classification. Often used together with CrossEntropyLoss
71
What's the output range of - Tanh - sigmoid - softmax - ReLu
**Tanh**: (−1, 1) Zero-centered output, helpful for RNNs or when data is mean-centered. Still can saturate. **sigmoid**: (0, 1) Good for probabilities in binary classification output layer. **ReLu**: [0, ∞) Default in most modern nets. Fast, sparse activations. Watch for “dying ReLU”
72
Typical normalization approaches
- Min-Max scaling ([0, 1] or [-1, 1]) - Standardization (mean=0, std=1. --> most ML models) - Robust scaling (median=0, IQR=1)
73
What can be the reasons if training loss increases?
(It means the model predictions are getting worse (larger error)) Reasons: - Learning rate too high: the optimizer overshoots minima. - Bad initialization or exploding gradients. - Bug in loss computation or data pipeline. - Sudden data distribution change (if you reshuffle with different data properties).
74
Image normalization: min-max scaling formula?
(img - img.min()) / (img.max() - img.min()) or (x - min) / (max - min)
75
z-score formula for normalization
Goal: center the data and give it unit variance. (x - mean) / std
76
Types of gradient descent / training procedures
- **Full-batch** (a.k.a. batch gradient descent) Uses all training samples to compute the gradient before making one parameter update per epoch. - **Mini-batch gradient descent** Splits the dataset into smaller batches (e.g. 32, 64). Each batch produces one gradient estimate and update. - **Online / stochastic gradient descent (SGD)** Uses a batch size of 1 sample for each update
77
What are the two Dataset types in the Pytorch Dataset class?
mapping style datasets iterable datasets
78
What's the difference between torch.jit.trace and torch.jit.script?
torch.jit.trace → “replays” the **route** of the example input; it cannot see data-dependent branching! (if statements) But doesn't keep the input! torch.jit.script → keeps Python control flow and conditions alive.
79
What does torch.utils.data.Subset do?
It wraps an existing Dataset and exposes only the samples whose indices you pass in. subset = Subset(dataset, indices)
80
What's the default input layout of images in PyTorch?
N × C × H × W number of images x number of channels x height x width
81
What's the tuple "batch shape" consist of?
(batch size, sample shape)