Pytorch Flashcards

Question

What is Transfer Learning?

Answer 1

Reusing knowledge from a pre-trained model on a large source task for a smaller target task.

Answer 2

Option 1: Feature extractor (freeze backbone, train new head). Option 2: Fine-tuning (train more layers or entire model).

Answer 3

If domain differs significantly, enough data is available, or Option 1 fails to achieve desired accuracy.

Answer 4

Saves compute power, data, and time; pre-trained models (e.g., HuggingFace) are readily available.

Answer 5

"logistic function" Maps the output between 0 and 1. Activation after sigmoid can be interpreted as the probability of belonging to the positive class.

Answer 6

- Provides GPU support (NumPy is CPU only) - dynamic computation graphs with autograd - a Pythonic API like NumPy - and a large ecosystem (TorchVision, TorchAudio)

Answer 7

The core data structure, similar to NumPy arrays but with GPU and autograd support.

Answer 8

VRAM is GPU memory; PyTorch allows moving data to VRAM for parallel computation.

Answer 9

To choose weights so the model output ŷ approximates the true target y by minimizing a loss function L(ŷ, y).

Answer 10

Because we usually cannot compute the loss minimum directly.

Answer 11

Compute loss --> Compute gradients w.r.t. weights --> Update weights by subtracting learning rate times gradient (θ ← θ - η∇L).

Answer 12

To create custom datasets with a standardized interface. Must implement __len__() and __getitem__(). --> to access and optionally transform data samples (e.g., applying preprocessing or augmentation)

Answer 13

- A single sample (e.g., image, label, ID), with optional preprocessing and augmentation. - Shape and type of a single sample can be derived from reading the __getitem__() method

Answer 14

A DataLoader wraps a Dataset and makes it possible to: - Iterate in minibatches (not one item at a time). - Shuffle data each epoch. - Use multiple workers (processes) to load data in parallel. - Apply collate logic to combine samples into a batch tensor.

Answer 15

Use the collate_fn argument: It combines the samples extracted by the getitem method into a single minibatch.

Answer 16

1. Derive a Dataset class with **__getitem__** and **__len__**. 2. Create dataset *instance*. 3. Create **Subset** splits. 4. Create **DataLoader** with *batchsize, shuffle, numworkers*. 5. Loop over DataLoader to get *minibatches*.

Answer 17

Disable shuffling because evaluation does not depend on sample order.

Answer 18

1. Store dataset split indices 2. Use torch.utils.data.Subset (= FIX the indices) 3. Set random seeds.

Answer 19

No, DataLoader's default collate function can handle NumPy arrays and convert them to tensors.

Answer 20

For debugging and traceability.

Answer 21

Base class for all neural networks and layers; registers parameters and submodules automatically.

Answer 22

__init__() to define layers and forward() to specify how input is transformed to output.

Answer 23

It runs forward() and triggers any registered hooks automatically.

Answer 24

Use to(device=...) to move them to a GPU device.

Answer 25

torch.nn library (https://pytorch.org/docs/stable/nn.html).

Answer 26

Every neuron is connected to all neurons in the next layer; high complexity; typically requires flattening multidimensional input.

Answer 27

Applies kernels (filters) across input dimensions to capture spatial or temporal patterns, often followed by pooling and flattening.

Answer 28

1D for sequences (e.g., time series), 2D for images, nD possible but rare.

Answer 29

Processes sequential inputs, reusing the same weights and hidden state; variants include LSTM and Transformers.

Answer 30

Computationally cheap and avoids vanishing gradients; most common activation.

Answer 31

Self-normalizing, improving stability of training.

Answer 32

Outputs in [0,1]; older, prone to vanishing gradients.

Answer 33

PyTorch's automatic differentiation engine that builds computation graphs and computes gradients using backward().

Answer 34

loss.backward().

Answer 35

torch.optim.SGD (with optional momentum) and torch.optim.Adam (adaptive learning rate and momentum).

Answer 36

1. optimizer.zero_grad() 2. loss.backward() 3. optimizer.step()

Answer 37

Mean Squared Error (torch.nn.MSELoss), typically with no output activation.

Answer 38

Cross-Entropy (torch.nn.CrossEntropyLoss) or BCEWithLogitsLoss for binary tasks (includes sigmoid internally).

Answer 39

It combines sigmoid and binary cross-entropy for numerical stability.

Answer 40

Update = one weight update; Epoch = one full pass through the training data.

Answer 41

Stop training when validation loss does not improve for m updates/epochs and use the model with best validation loss.

Answer 42

Balanced Accuracy (BAC) AUC (Ranking ability across thresholds) F1 (Balance precision & recall) Fβ (Tune recall vs precision)

Answer 43

Visualize loss, weights, gradients; monitor training and validation performance.

Answer 44

Check if gradients and weights are changing; consider gradient clipping.

Answer 45

First find a model that can overfit, then make it smaller or add regularization.

Answer 46

Converts logits to class probabilities in multi-class classification. Often used together with CrossEntropyLoss

Answer 47

**Tanh**: (−1, 1) Zero-centered output, helpful for RNNs or when data is mean-centered. Still can saturate. **sigmoid**: (0, 1) Good for probabilities in binary classification output layer. **ReLu**: [0, ∞) Default in most modern nets. Fast, sparse activations. Watch for “dying ReLU”

Answer 48

- Min-Max scaling ([0, 1] or [-1, 1]) - Standardization (mean=0, std=1. --> most ML models) - Robust scaling (median=0, IQR=1)

Answer 49

(It means the model predictions are getting worse (larger error)) Reasons: - Learning rate too high: the optimizer overshoots minima. - Bad initialization or exploding gradients. - Bug in loss computation or data pipeline. - Sudden data distribution change (if you reshuffle with different data properties).

Answer 50

(img - img.min()) / (img.max() - img.min()) or (x - min) / (max - min)

Answer 51

Goal: center the data and give it unit variance. (x - mean) / std

Answer 52

- **Full-batch** (a.k.a. batch gradient descent) Uses all training samples to compute the gradient before making one parameter update per epoch. - **Mini-batch gradient descent** Splits the dataset into smaller batches (e.g. 32, 64). Each batch produces one gradient estimate and update. - **Online / stochastic gradient descent (SGD)** Uses a batch size of 1 sample for each update

Answer 53

mapping style datasets iterable datasets

Answer 54

torch.jit.trace → “replays” the **route** of the example input; it cannot see data-dependent branching! (if statements) But doesn't keep the input! torch.jit.script → keeps Python control flow and conditions alive.

Answer 55

It wraps an existing Dataset and exposes only the samples whose indices you pass in. subset = Subset(dataset, indices)

Answer 56

N × C × H × W number of images x number of channels x height x width

Answer 57

(batch size, sample shape)

Pytorch Flashcards

(81 cards)