What is a convolution neural network (CNN)
deep neural network designed for preprocessing visual data
they detect patterns in images by using convolutions layers to detect spatial hierarchies of patterns
what is spacial locality in images
images contain spacial structure and therefore nearby pixels are highly correlated
features therefore exist in local regions
How do CNN exploit spacial locality
local receptive fields
What is convolution
What mathematical operation is involved in convolution
A mathematical operation where:
A small matrix (kernel/filter) slides over the image
Performs:
Element-wise multiplication
Summation
Produces a feature map
What does a kernel detect
Filters detect edges, textures, and objects.
What are the stride controls
Stride controls movement of the filter, padding
preserves spatial dimensions.
What does padding do
Controls output size:
valid → no padding
→ Output smaller than input
same → zero padding
→ Output same size as input
Define convolution layers
Extract feature maps from images.
define pooling layers
Reduce spatial size (dimensionality) and retain key information.
What do activation functions do
Activation Functions to introduce non-linearity, or generating probabilities.
What do fully connected layers do
Perform final decision-making.
What is step one of CNN back propagation
CNNs process input images using filters to
find patterns/features. (convolution+pooling)
What is step 2 of CNN back propagation
Loss Function to measure difference between
predicted and actual class.
What is step 3 of CNN backpropagation
CNN compute gradients using chain rule
(backpropagation).
What is the final step of CNN backpropagation
CNNs adjust filters weights by minimising
error using Optimisation algorithms (SGD,
Adam).
Explain convolution layers in more detail
A kernel (filter) slides over the image, performing element-wise multiplication and summation.
Define stride
stride refers to the step size by which the convolutional filter (or kernel) moves across the input
image during the convolution operation.
What does padding=’valid’ mean
padding=’valid’ means that the convolution is only computed where the input and the filter fully overlap, and
therefore the output is smaller than the input.
what does padding=’same’ mean
padding=’same’ means that we have an output that is the same size as the input, for which the area around the input
is padded with zeros.
show a CNN architecture example
What is a DCNN
A DCNN is simply a CNN with many convolutional layers stacked together.
What is the key idea of DCNN
Key Idea:
Early layers → detect simple features (edges)
Middle layers → detect shapes/textures
Deep layers → detect high-level objects
👉 Depth = ability to learn hierarchical features.
What is the purpose of the pooling layer
Purpose:
Reduce spatial dimensions
Reduce computation
Improve generalisation
Control overfitting