Activation Functions Flashcards

Question 1

Q

Softmax

Answer

A

activation function
common in multi-class classification
transforms a vector of real numbers into a probability distribution, where each element represents the probability of the input belonging to a specific class.

Question 2

Q

ReLU

Answer

A

Question 3

Q

What are the benefits of ReLU?

Answer

A

Adds non-linearity to the network enabling it to learn complex patterns in the data
Computationally inexpensive
Alleviates vanishing gradient problem that can occur with sigmoid and tanh fns during backpropagation in deep networks

Question 4

Q

Sigmoid

Answer

A

Question 5

Q

tanh

Answer

A

Question 6

Q

what’s faster to train with sigmoid or tanh?

Answer

A

tanh because it is zero centered
tanh has steeper gradient around the center compared to sigmoid which can mitigate the vanishing gradient problem during backpropagation

Question 7

Q

Vanishing gradients

Answer

A

occurs when gradients, used to update neural network weights during training, become extremely small as they propagate backward through the network

(7 cards)