Computer vision Flashcards

Question 1

Q

Why can’t CNN do object detection

Answer

A

can’t identify multiple objects in one photo.

Question 2

Q

What is the goal of a R-CNN

Answer

A

The goal of region-based convolutional neural network (R-CNN) is to take in an image, and correctly identify where
the main objects (via a bounding box) in the image.

Question 3

Q

What is computer Vision

Answer

A

Computer vision is a field of AI that enables computers to understand and interpret images or video.

Question 4

Q

What is object detection

Answer

A

Object detection identifies what objects are in an image and where they are located.

Question 5

Q

WHat is image segmentation

Answer

A

Image segmentation divides an image into regions or objects at the pixel level.

Question 6

Q

What is the pipeline for R-CNN

Answer

A

Step 1 — Input image
The original image is given to the model.
Step 2 — Region proposals
Around 2000 candidate regions are generated
Done using Selective Search
Step 3 — CNN feature extraction
Each region is passed through a CNN to extract features.
Step 4 — Classification
Each region is classified (e.g., person, car).
Step 5 — Bounding box refinement
The predicted bounding boxes are adjusted to improve accuracy.

Question 7

Q

What loss function does R-CNN use

Answer

A

The loss combines:
Classification loss + Bounding box regression loss

Question 8

Q

What are the cons of R-CNn

Answer

A

Computationally very expensive as you would have to classify 2000 region
proposals per image.
The selective search algorithm is a fixed algorithm. Therefore, no learning is
happening at that stage. This could lead to the generation of bad candidate
region proposals.
Very slow

Question 9

Q

Way is R-CNN so slow

Answer

A

It requires a forward pass of the CNN (AlexNet)
for every single region proposal for every
single image (that is around 2k forward passes
per image!).
It has to train three different models
separately - the CNN to generate image
features, the classifier that predicts the class,
and the regression model to tighten the
bounding boxes.

Question 10

Q

What is fast r-cnn

Answer

A

Fast R-CNN improves the efficiency of R-CNN.
Key idea
Instead of running the CNN for every region:
Run the CNN once for the whole image.
Then extract features from the resulting feature map.

Question 11

Q

What does Region of interest pooling allow

Answer

A

Take the shared feature map
Extract features for each proposed region
Convert them into fixed-size feature vectors

these vectors are used for classification

Question 12

Q

What are the cons of fast R-CNN

Answer

A

Fast R-CNN is also using selective search to find out the region proposals.
Selective search is a slow and time-consuming process affecting the performance of
the network.

Question 13

Q

What are the pros of Fast R-CNN

Answer

A

The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to feed
2000 region proposals to the convolutional neural network every time. Instead, the
convolution operation is done only once per image and a feature map is generated from
it.

Question 14

Q

What is the faster R-CNN pipeline

Answer

A

CNN extracts feature map
Region Proposal Network (RPN) generates proposals
RoI pooling extracts features
Classifier predicts object classes.

Question 15

Q

What is region of interest pooling

Answer

A

after proposals are generated:
1. RoI pooling extracts fixed-size feature maps
2. These features are used to classify the object.

This ensures the network can classify objects regardless of their original size.

Question 16

Q

cont of ROI

Answer

Study These Flashcards

A

After the RPN step, we have a bunch of object proposals with no class assigned to them. Our next problem to solve is
how to take these bounding boxes and classify them into our desired categories.
⚫ Faster R-CNN tries to solve, or at least mitigate, this problem by reusing the existing convolutional feature map. This is
done by extracting fixed-sized feature maps for each proposal using region of interest pooling. Fixed size feature maps
are needed for the R-CNN in order to classify them into a fixed number of classes.

Question 17

Q

What are the advantages of semantic segmentation

Answer

Study These Flashcards

A

The U-Net combines the location information from
the downsampling path with the contextual
information in the upsampling path to finally obtain a
general information combining localisation and
context, which is necessary to predict a good
segmentation map.
No dense layer, so images of different sizes can be
used as input (since the only parameters to learn on
convolution layers are the kernel, and the size of the
kernel is independent from input image’ size).

Question 18

Q

What is Mask R-CNN

Answer

Study These Flashcards

A

Mask R-CNN does this by adding a
branch to Faster R-CNN that outputs a
binary mask that says whether or not a
given pixel is part of an object.

Computer vision Flashcards

(18 cards)