What is a deep neural network (DNN)?
A composition of layers fθ = fL ∘ fL−1 ∘ … ∘ f1 that maps input x through multiple transformations.
What is the forward pass through layer l?
zl+1 = σl(Wl zl + bl).
What is a shallow neural network?
A neural network with one hidden layer: fθ(x)=ηᵀ σ(Wx+b).
What are W and b in a neural network layer?
W are weights and b are biases for that layer.
What are activations?
The outputs after applying the activation function: zl = σ(Wl−1 zl−1 + bl−1).
What are pre-activations?
The linear part before the activation function: al = Wl−1 zl−1 + bl−1.
What is a neuron?
A single hidden unit computing σ(wᵀx + b).
What is network depth?
Number of hidden layers (sometimes +1).
What is network width?
Number of neurons in a layer.
What is network capacity?
Total number of neurons across the network.
What does a sigmoid activation do?
σ(u) = 1 / (1 + e^(−u)).
What is the ReLU activation?
σ(u) = max(0,u).
What is tanh activation?
σ(u) = tanh(u).
What is the step activation?
σ(u) = 1_{u > 0}.
What is the Swish activation?
σ(u) = u / (1 + e^(−βu)).
What property do neural networks share with RBF kernels?
They are universal approximators: can approximate any continuous function on compact sets.
Why use deep instead of shallow networks?
Deep networks approximate functions with fewer parameters than shallow networks.
What is hierarchical feature learning?
Hidden layers learn progressively more abstract features (edges → shapes → object parts → concepts).
Why does the last layer behave like a shallow model?
It is linear in its parameters: fθ(x)=ηᵀz where z is the final learned feature vector.
How can we inspect what features a DNN learns?
By analysing which input patterns strongly activate specific neurons.
What is data-driven feature learning?
Learning features automatically from data rather than manually designing them.
Why are GPUs used for DNN training?
GPUs perform thousands of operations in parallel, enabling fast training of large models.
What is a spurious feature?
A feature correlated with the label in training data but not truly relevant to the task.
Example of spurious features?
Skin lesion classification using presence of a ruler as a signal for malignancy.