Softmax
ReLU
What are the benefits of ReLU?
Sigmoid
tanh
what’s faster to train with sigmoid or tanh?
Vanishing gradients
occurs when gradients, used to update neural network weights during training, become extremely small as they propagate backward through the network