Why can’t CNN do object detection
can’t identify multiple objects in one photo.
What is the goal of a R-CNN
The goal of region-based convolutional neural network (R-CNN) is to take in an image, and correctly identify where
the main objects (via a bounding box) in the image.
What is computer Vision
Computer vision is a field of AI that enables computers to understand and interpret images or video.
What is object detection
Object detection identifies what objects are in an image and where they are located.
WHat is image segmentation
Image segmentation divides an image into regions or objects at the pixel level.
What is the pipeline for R-CNN
Step 1 — Input image
The original image is given to the model.
Step 2 — Region proposals
Around 2000 candidate regions are generated
Done using Selective Search
Step 3 — CNN feature extraction
Each region is passed through a CNN to extract features.
Step 4 — Classification
Each region is classified (e.g., person, car).
Step 5 — Bounding box refinement
The predicted bounding boxes are adjusted to improve accuracy.
What loss function does R-CNN use
The loss combines:
Classification loss + Bounding box regression loss
What are the cons of R-CNn
Way is R-CNN so slow
What is fast r-cnn
Fast R-CNN improves the efficiency of R-CNN.
Key idea
Instead of running the CNN for every region:
Run the CNN once for the whole image.
Then extract features from the resulting feature map.
What does Region of interest pooling allow
Take the shared feature map
Extract features for each proposed region
Convert them into fixed-size feature vectors
these vectors are used for classification
What are the cons of fast R-CNN
Fast R-CNN is also using selective search to find out the region proposals.
Selective search is a slow and time-consuming process affecting the performance of
the network.
What are the pros of Fast R-CNN
The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to feed
2000 region proposals to the convolutional neural network every time. Instead, the
convolution operation is done only once per image and a feature map is generated from
it.
What is the faster R-CNN pipeline
What is region of interest pooling
after proposals are generated:
1. RoI pooling extracts fixed-size feature maps
2. These features are used to classify the object.
This ensures the network can classify objects regardless of their original size.
cont of ROI
After the RPN step, we have a bunch of object proposals with no class assigned to them. Our next problem to solve is
how to take these bounding boxes and classify them into our desired categories.
⚫ Faster R-CNN tries to solve, or at least mitigate, this problem by reusing the existing convolutional feature map. This is
done by extracting fixed-sized feature maps for each proposal using region of interest pooling. Fixed size feature maps
are needed for the R-CNN in order to classify them into a fixed number of classes.
What are the advantages of semantic segmentation
What is Mask R-CNN
Mask R-CNN does this by adding a
branch to Faster R-CNN that outputs a
binary mask that says whether or not a
given pixel is part of an object.