2 - Advanced Learning Algorithms Flashcards

(92 cards)

1
Q

Fill in the gaps

The convention used in Linear Algebra and re-used in this course is to use upper case variable names for ………. and lower case variable names for ……….

A

The convention used in Linear Algebra and re-used in this course is to use upper case variable names for MATRICES and lower case variable names for VECTORS or SCALARS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following activation functions is the most common choice for the hidden layers of a neural network?

A

ReLU
Rectified Linear Unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do people mean when they say the hidden layers of a neural network does not employ an activation function?

A

This means that the output of a layer is equal to its input, which is also the behaviour of linear activation function.

double-check statement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

At a high-level, what are the three key steps to train an artificial neural network?

A
  1. Specify the model f_wb(X)
  2. Specify loss and cost L(f_wb(X),y)
  3. Train on data to minimize J(w,b)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

W.r.t. to neural networks, what is an epoch?

A

It refers to one complete pass of the entire training dataset through the learning algorithm. In other words, when all the data samples have been exposed to the neural network for learning patterns, one epoch is said to be completed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the three key steps involved in training a neural network in TensorFlow.

A

The first step involves specifying the model architecture, defining the layers and their connections for inference. Second, the model is compiled by choosing a suitable loss function, such as binary cross-entropy. Finally, the fit function is called to train the model on the provided dataset for a specified number of epochs, optimising the parameters to minimise the chosen loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the primary architectural difference between a dense layer and a convolutional layer in a neural network?

A

In a dense layer, every neuron receives input from all activations in the preceding layer. In contrast, a convolutional layer’s neurons receive input only from a limited, specific region (or “window”) of the input from the previous layer, which helps in tasks like image processing or time-series analysis by focusing on local features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain the main advantages of using a ReLU activation function in hidden layers compared to a sigmoid activation function.

A

ReLU is computationally faster than sigmoid as it only involves max(0, z). More importantly, ReLU helps prevent the “vanishing gradient” problem because it only goes flat on one side (for negative values), whereas sigmoid goes flat on both extremes, slowing down gradient descent in those regions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does the Adam optimisation algorithm differ from basic gradient descent in its approach to learning rates?

A

Adam (Adaptive Moment Estimation) automatically adjusts the learning rate during training, unlike basic gradient descent which uses a single, fixed global learning rate. Furthermore, Adam uses a different, adaptive learning rate for each individual parameter of the model, accelerating learning by adjusting speeds based on the parameter’s movement (e.g., increasing if moving consistently in one direction, decreasing if oscillating).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is it crucial to use non-linear activation functions (i.e., not just linear activation) in the hidden layers of a neural network?

A

If only linear activation functions are used in all hidden layers, the entire neural network, regardless of its depth, would effectively reduce to a simple linear function (like linear regression). Non-linear activation functions allow the network to learn and represent complex, non-linear relationships in the data, which is essential for solving most real-world problems that linear models cannot address.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the binary cross-entropy loss function, and for what type of problem is it typically used?

A

The binary cross-entropy loss function measures the performance of a classification model whose output is a probability value between 0 and 1. It’s typically used for binary classification problems, where the target label Y can only take on two values (e.g., 0 or 1), by quantifying the difference between the predicted probabilities and the true labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Briefly explain the informal definition of a derivative as presented in the context of neural network learning.

A

Informally, a derivative (K) indicates how much a function’s output (J of W) changes when its input (W) changes by a tiny amount (epsilon). If W goes up by epsilon, and J(W) goes up by K * epsilon, then K is the derivative of J(W) with respect to W. This concept is crucial for understanding how small adjustments to parameters affect the cost function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Distinguish between multiclass classification and multilabel classification, providing an example for each.

A

Multiclass classification problems involve predicting one label out of more than two possible discrete categories (e.g., classifying handwritten digits from 0-9, where an image is only one digit). Multilabel classification problems, on the other hand, involve predicting multiple labels for a single input simultaneously (e.g., identifying if a picture contains a car, a bus, and/or a pedestrian).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is it recommended to use the from_logits=True argument when compiling a model with BinaryCrossentropy or CategoricalCrossentropy loss in TensorFlow?

A

Using from_logits=True allows TensorFlow to combine the activation function (like sigmoid or softmax) and the loss calculation into a single, numerically more stable operation. This helps to reduce numerical round-off errors, especially when dealing with very small or very large intermediate values, leading to more accurate and reliable training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is backpropagation, and why is it an important algorithm in neural network training?

A

Backpropagation is a key algorithm in neural network learning that efficiently computes the derivatives of the cost function with respect to all the model’s parameters. This is crucial because these derivatives are then used by optimisation algorithms like gradient descent or Adam to update the parameters, allowing the neural network to learn and minimise its cost function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Activation (a)?

A

The output value of an artificial neuron or a layer of neurons, often representing a probability or a transformed input

It is analogous to how much a biological neuron is ‘firing.’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define Activation Function (G).

A

A non-linear function applied to the output of each neuron’s linear combination of inputs and weights

Examples include the sigmoid function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an Artificial Neural Network (ANN)?

A

A computational model inspired by the structure and function of biological neural networks, designed to recognise patterns and relationships in data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the role of an Axon?

A

The output wire of a biological neuron that transmits electrical impulses to other neurons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is Backward Propagation (Backpropagation)?

A

An algorithm used for training neural networks, which propagates errors backwards through the network to adjust weights and biases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define Bias (b) in the context of neural networks.

A

A parameter in a neuron that acts as an offset, shifting the activation function’s output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the Cell Body (Nucleus) of a neuron?

A

The main part of a biological neuron where computations occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a Column Vector?

A

A matrix with a single column (e.g., Nx1 matrix).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Deep Learning?

A

A subfield of machine learning that uses neural networks with multiple layers to learn from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Define Dense Layer.
A type of neural network layer where each neuron in the layer is connected to every neuron in the preceding layer.
26
What are Dendrites?
The input wires of a biological neuron that receive electrical impulses from other neurons.
27
What is Feature Engineering?
The manual process of selecting and transforming raw data into features that can be used by a machine learning algorithm.
28
Explain Forward Propagation (Forward Prop).
The process of passing input data through the neural network, layer by layer, to compute the output or prediction.
29
What is a GPU (Graphics Processing Unit)?
A specialised electronic circuit designed to rapidly manipulate and alter memory to accelerate image creation.
30
Define Hidden Layer.
Any layer in a neural network located between the input layer and the output layer.
31
What was the ImageNet Moment (2012)?
A significant event in computer vision where deep learning models achieved unprecedented accuracy in the ImageNet Challenge.
32
What is the Input Layer (Layer 0)?
The initial layer of a neural network that receives the raw input features for processing.
33
What is a Layer in neural networks?
A grouping of neurons that take similar inputs and collectively output a set of numbers.
34
Define Linear Regression.
A traditional machine learning algorithm used for predicting a continuous output value based on input features.
35
What is Logistic Regression?
A traditional machine learning algorithm used for binary classification, predicting the probability of an input belonging to a certain class.
36
What is a Matrix?
A rectangular array of numbers, symbols, or expressions, arranged in rows and columns.
37
Explain Matrix Multiplication (np.matmul).
A fundamental operation in linear algebra where two matrices are multiplied to produce a new matrix.
38
What is a Multi-layer Perceptron (MLP)?
Another term for a feedforward neural network with one or more hidden layers.
39
What is NumPy?
A fundamental Python library for numerical computing, providing support for large, multi-dimensional arrays.
40
What is the Output Layer?
The final layer of a neural network that produces the network's prediction or desired output.
41
What are Parameters (W, b) in neural networks?
The trainable components of a neural network, specifically the weights (W) and biases (b).
42
What is PyTorch?
Another leading open-source machine learning framework, widely used for deep learning research and development.
43
Define Row Vector.
A matrix with a single row (e.g., 1xN matrix).
44
What is Sequential (tf.keras.Sequential)?
A TensorFlow/Keras API that allows for the creation of neural network models by stacking layers.
45
What is the Sigmoid Function?
A common activation function that maps any real-valued number to a value between 0 and 1.
46
What is a Tensor?
TensorFlow's primary data structure, representing multi-dimensional arrays.
47
What is TensorFlow?
An open-source machine learning framework developed by Google, widely used for building and training neural networks.
48
What is a Unit in a neural network?
Another term for a neuron within a neural network layer.
49
Define Vectorisation.
The process of converting loop-based operations into highly optimised matrix operations.
50
What are Weights (W) in neural networks?
Parameters in a neuron that determine the strength or importance of each input feature.
51
What was the original motivation behind the invention of neuronet networks, and how has their relationship to that motivation evolved today?
The original motivation for neuronet networks was to mimic how the human brain learns and thinks. However, modern neuronet networks have diverged significantly from biological brain models, with current development primarily driven by engineering principles for effective algorithms, rather than strict biological mimicry.
52
Briefly describe the components and function of a simplified biological neuron, and how this concept is analogous to an artificial neuron.
A simplified biological neuron receives inputs via dendrites, performs computations in its cell body, and sends outputs via an axon. An artificial neuron models this by taking numerical inputs, performing a mathematical computation (often like logistic regression), and producing a single numerical output, also called an activation.
53
Name two significant factors that contributed to the resurgence and widespread adoption of neural networks and deep learning after 2005.
Two significant factors for the resurgence of neural networks after 2005 were the explosion in digital data availability, which large neural networks could effectively leverage, and the development of faster computer processors, especially GPUs, which are highly efficient at the matrix multiplications central to neural network computations.
54
In the context of demand prediction for t-shirts, explain what a "hidden layer" represents and why it is given that name.
In demand prediction, a "hidden layer" processes raw inputs to compute intermediate, more abstract features, such as affordability or awareness. It's called "hidden" because the "correct" values for these intermediate features are not explicitly provided in the training dataset; the network learns them autonomously.
55
How does a neural network, specifically in the context of face recognition, learn to extract increasingly complex features from raw input data?
A neural network learns to extract features hierarchically in tasks like face recognition. Earlier hidden layers detect simple features (e.g., edges), which are then combined by subsequent hidden layers to recognise more complex components (e.g., eyes, noses), ultimately forming complete object shapes in deeper layers.
56
Explain the purpose of the "activation function" (G) within a neuron and why it's a crucial part of the computation.
The activation function (G) in a neuron introduces non-linearity into the network, allowing it to learn complex, non-linear relationships in the data. It transforms the linear combination of inputs and weights into the neuron's output, determining its "activation" level and enabling the network to model intricate patterns.
57
What is the conventional way of counting the number of layers in a neural network, and what layer is typically not included in this count?
The conventional way of counting layers in a neural network includes all hidden layers and the output layer. The input layer (sometimes referred to as Layer 0) is typically not included in this count.
58
Describe the primary difference in how input feature vectors (like X) are typically represented in NumPy for traditional machine learning versus in TensorFlow for neural networks.
In NumPy for traditional machine learning, input feature vectors (X) were often represented as 1D arrays. In TensorFlow for neural networks, the convention is to represent input data, even for a single example, as 2D matrices (e.g., a 1xN row vector) to enhance internal computational efficiency.
59
What is the tf.keras.Sequential function used for in TensorFlow, and how does it simplify building neural network models compared to explicit layer-by-layer computation?
The tf.keras.Sequential function simplifies neural network building by allowing users to sequentially string together layers into a single model. This abstracts away explicit layer-by-layer forward propagation calls, streamlining the code for defining and running the network's computations.
60
Why is vectorisation, particularly through matrix multiplication, so important for the efficiency and scalability of modern neural networks?
Vectorisation, primarily through matrix multiplication, is crucial because it allows for the parallel processing of computations across many neurons and layers simultaneously. Modern hardware, especially GPUs, is highly optimised for these operations, significantly accelerating the training and inference of large neural networks.
61
What is an Activation Function (g)?
A function applied to the weighted sum of inputs plus bias (Z) in a neuron, introducing non-linearity to the network's output.
62
Define Adam (Adaptive Moment Estimation)
An advanced optimization algorithm that adaptively adjusts the learning rate for each parameter.
63
What is Automatic Differentiation (Autodiff)?
Techniques used by machine learning frameworks to automatically compute derivatives of complex functions.
64
What is Backpropagation?
A key algorithm in neural network learning that efficiently computes the gradients of the cost function with respect to all the network's parameters.
65
What is Binary Classification?
A classification problem where the output label can take on one of two possible values, typically 0 or 1.
66
What is Binary Cross Entropy Loss?
A common loss function used for binary classification problems, measuring the dissimilarity between the predicted probabilities and the true binary labels.
67
Define Computation Graph
A graphical representation of the mathematical operations involved in computing a function.
68
What is a Convolutional Layer?
A type of neural network layer where each neuron processes input from a limited, local region of the previous layer.
69
What is a Convolutional Neural Network (CNN)?
A neural network architecture that incorporates one or more convolutional layers, widely used for image and sequential data processing.
70
What is the Cost Function (J)?
A measure of the overall error or discrepancy between the neural network's predictions and the true labels across the entire training dataset.
71
Define Dense Layer
A standard neural network layer where every neuron in the layer receives input from all neurons in the previous layer.
72
What is a Derivative?
A mathematical concept that quantifies the rate at which a function's output changes with respect to a small change in its input.
73
What does an Epoch refer to?
One complete pass through the entire training dataset during the learning process.
74
What is a Fit Function?
A function in machine learning libraries used to train a model on a given dataset, minimizing the specified cost function.
75
Define Forward Propagation (Inference)
The process of passing input data through the neural network layers to compute the network's prediction.
76
What does 'from_logits=True' signify?
An argument in TensorFlow's loss functions that instructs the framework to expect raw, unnormalized scores as input.
77
What is Gradient Descent?
An iterative optimization algorithm used to find the minimum of a cost function by taking steps proportional to the negative of the gradient.
78
Define Hidden Layer
A layer of neurons in a neural network located between the input and output layers.
79
What is Learning Rate (alpha)?
A hyperparameter in optimization algorithms that determines the step size taken in the direction of the negative gradient.
80
What is a Linear Activation Function?
An activation function where the output is equal to the input (g(Z) = Z).
81
What is a Loss Function?
A measure of how well a learning algorithm performs on a single training example.
82
Define Mean Squared Error (MSE) Loss
A common loss function used for regression problems, calculating the average of the squared differences between predicted and true values.
83
What is Multiclass Classification?
A classification problem where the output label can take on one of more than two mutually exclusive categories.
84
What is Multilabel Classification?
A classification problem where a single input can be associated with multiple output labels simultaneously.
85
Define Neural Network (Neuronet Network)
A computational model inspired by the structure of the human brain, consisting of interconnected layers of artificial neurons.
86
What is Numerical Roundoff Error?
The error that occurs in computer calculations due to the finite precision with which numbers are stored.
87
What is Overfitting?
A phenomenon where a model learns the training data too well, leading to poor generalization performance on unseen data.
88
What are Parameters (W, B)?
The trainable components of a neural network, consisting of weights (W) and biases (B).
89
What is ReLU (Rectified Linear Unit)?
A popular non-linear activation function defined as g(Z) = max(0, Z).
90
What is a Sigmoid Activation Function?
A non-linear activation function that squashes its input to a value between 0 and 1.
91
Define Softmax Regression
A generalization of logistic regression to multiclass classification problems, which outputs a probability distribution over multiple classes.
92
What is TensorFlow?
An open-source machine learning framework developed by Google, widely used for building and training neural networks.