Computer Vision & Deep Learning Flashcards by Peter Oliver

What is computer vision?

Turns raw image data into higher-level concepts so that can interpreted and acted upon. Aims to categorize the visual world as we do.

How well did you know this?

Not at all

Perfectly

Outline the traditional machine learning flow. What were the problems with early computer vision techniques.

Input image
Feature Extractor & Features List (manual tasks)
Traditional ML algorithm (automated)

Required strict complex rules to detect features in images which required significant manual effort. And was prone to fail if there were small changes in object size, rotation or data size.

How well did you know this?

Not at all

Perfectly

How does deep learning solve problems of machine learning?

Feeding a sufficient number of well-labelled images so that it automatically learns where the edges are and how pixel level information combines to form features. Determines patterns that make image distinct.

How well did you know this?

Not at all

Perfectly

What is deep learning?

A subfield of machine learning inspired by how the brain is structured and operates. Processes huge amounts of data, finding patterns humans often cannot detect. The word “deep” refers to the number of hidden layers in the neural network, which provide much of the power to learn.

How well did you know this?

Not at all

Perfectly

Give term in artificial neural network for the following biological components.

Soma - Neuron
Dendrite - Input
Axon - Output
Synapse - Weight

How well did you know this?

Not at all

Perfectly

What is excitatory? What is the opposite?

Excitatory - weights with positive value that increase probability of neuron firing.
Inhibitory - weights with negative value that decrease probability of neuron firing.

How well did you know this?

Not at all

Perfectly

What does ANN stand for? What is the ANN approach? What is the name of this function?

Artificial Neural Network

Each neuron computers weighted sum of its inputs. If < threshold value output is 0 else 1.

Step.

How well did you know this?

Not at all

Perfectly

What is activation function? Give formula for sigmoid.

Function employed in second stage of an artificial neuron, its maps weighted sum value to output value.

1 / 1 + e^-x where x is the weighted sum

How well did you know this?

Not at all

Perfectly

What is the purpose of activation functions?

To introduce non-linearities.

How well did you know this?

Not at all

Perfectly

What is ReLu? Give another name. Why is it most popular?

The Rectified Liner Unit is an activation function defined as y = max(0,x) where x is the weighted sum.

Rectifier.

Cheap and quick to compute as no complicated math.

How well did you know this?

Not at all

Perfectly

What is bias?

Each neuron has its own bias which is learnable. We pass weighted sum + bias into activation function. Offers increased flexibility.

How well did you know this?

Not at all

Perfectly

Show how bias offers increased flexibility. Want to decrease threshold of neuron from 0 to -1.

Lets say neuron x value (weighted sum) = -0.35
Using ReLU this neuron does not fire.
Now add bias the opposite of desired -1 so +1
-0.35 + 1 = 0.65
Using ReLu this neuron now does fire.

How well did you know this?

Not at all

Perfectly

What was the early drawback of ANNs? The proposed solution?

Major drawback was the process of making adjustments to the weights model.

Backpropagation. Propagates total loss back into neural network to know how much of the loss every node is responsible for and updates weights to minimize the loss.

How well did you know this?

Not at all

Perfectly

Process of backpropagation.

For each neuron calculated the weighted sum into the node. (multiply inputs * weights and add).

Then pass as x into activation function to get output value of neuron.

Error = given output of NN - output of last neuron

Calculate Unit Error for last neuron. By multiplying: delta = output * (1 - output) * (Error)

Then take this delta and pass back to all preceding neurons to multiply: delta = output * (1 - output) * (weight travelled through * passed value). Repeat this step till all levels of neurons done.

Next for each neuron: multiply learning rate * delta passed to this neuron * neuron output. This is change in weight.

New value for weight = change in weight + old value

We then repeat with new forward pass with new weights.

How well did you know this?

Not at all

Perfectly

What is the most basic form of neural network? Name three other types.

A fully connected neural network.

Recurrent Neural Network (RNN)
Convolutional Neural Network (CNN)
Generative Adversarial Network (GANs)

How well did you know this?

Not at all

Perfectly

What is RNN?

Recurrent Neural Network is designed to work with sequence prediction problems by not only processing the input but also the prior inputs across time. Essentially a string of neural networks that feed on each other based on complex algorithms. E.g. Autofill

How well did you know this?

Not at all

Perfectly

What are CNNs? What should it be used for?

Study These Flashcards

Convolutional Neural Networks are designed to map image data to an output variable. Develop an internal representation of a 2-D image. Allows model to learn position and scale in variant structures in the data.

Use for: Image Data, Classification prediction problems, regression prediction problems.

What should RNN be used for? And not used for?

Study These Flashcards

Should be used for:
- Text data
- Speech data
- Classification prediction problems
- Regression prediction problems

Don’t use for
- Tabular data
- Image data

What are GAN’s?

Study These Flashcards

Generative Adversarial Networks, have two generative models compete against each other in a tight feedback loop.

Generator creates lots of photos and discriminator tries to determine which is real. Then generator tries to make generated images look more real and so on.

After many iterations discriminator no longer needed. This process could also be done with unlabeled data.

Who invented GAN’s and when?

Study These Flashcards

Ian Goodfellow, 2014

What is most common hardware used for DL?

Study These Flashcards

GPU (but many companies are making their own)

In general, to what kind of tasks is deep learning applicable?

Study These Flashcards

Works well in domains in which there are large number of input features and large training datasets available.

Which neural network is best for speech recognition and translation?

Study These Flashcards

RNN

Which neural network is best for autonomous driving?

Study These Flashcards

Hybrid NN

Which neural network is best for online advertising?

Standard neural network

Which neural network is best for photo tagging?

CNN

What is feed-forward (a common NN feature)?

Organise their neurons into layers, with output from each layer being fed onwards to the next layer for further processing.

What are some key characteristics of CNNs?

Feed-forward Learning via back-propagation Maintain spatial integrity of images Extract features through convolutional filters Feature Map enhancement (via ReLU) Dimensionality Reduction via Pooling

What is spatial integrity?

How pixel combine to create features.

What is the key challenge when maintaining spatial integrity of images?

for 1000 x 1000 pixel image. We must store 1000 x 1000 x 3 (assuming colour image) = 3 million inputs.

What is the key benefit of CNN?

Allow us to work with larger images as they use fewer weights than a fully connected neural network.

What are the five stages of CNN?

Input image Convolution Pooling Flattening Normal ANN then processes.

What are convolutional filters?

A set of weights applied to pixel values in our input image. These are learned and refined by back propagation during training. Filtering process involves sliding a convolutional filter over an image to generate feature map. Filters can accentuate and dampen features, combination of feature maps powers CNNs predictions.

What is a feature map? Give another name. Variation with time?

The output of convolutional filter, is the filtered version of the input image. Using loads of maps predictions can be made. Activation maps. Early layers detect low-level features (e.g. edges) while later layers detect high-level features (e.g. shapes)

Briefly how is convolutional filter applied.

Take input image and starting in top left corner. Place n x n convolutional filter there and calculate. Then move by stride x, if we reach end of input then we can go down in input instead. Result is output feature map.

Discuss filter size and stride size.

Larger filters will take into account features spanning multiple pixels. Smaller filters will be reusable across an image. Longer stride results in smaller feature maps. Smaller stride wont miss important features.

Why add padding?

Without padding, pixels on edge of image are only partially processed and the result of convolution will be smaller than the original image size.

Give formula for result size of a convolution.

((W - F + 2P) / S) + 1 where W is the input image size, F is the filter size, P is the padding and S is the stride.

Explain feature map enhancement via ReLU.

Nodes that are positively activated are likely working with meaningful aspects compared to those not activated. So CNN does not waste time processing irrelevant features.

What is pooling?

Happens after ReLU, reduces the size of the feature map without loss of information. E.g. Max Pooling: Take maximum pixel value within the filter.

What is flattening?

Flatten pool feature map into single 1D column which can work well with normal ANN.

Computer Vision & Deep Learning Flashcards

(41 cards)