Deep Learning Flashcards

Question

Create a Gradient Descent in python.

Answer 1

* This is more efficient compared to stochastic gradient descent. * The generalization by finding the flat minima. * Mini-batches allows help to approximate the gradient of the entire training set which helps us to avoid local minima.

Answer 2

* Initialize random weight and bias. * Pass an input through the network and get values from the output layer. * Calculate the error between the actual value and the predicted value. * Go to each neuron which contributes to the error and then change its respective values to reduce the error. * Reiterate until you find the best weights of the network.

Answer 3

Hyperparameters are the **variables which determine the network structure**(Eg: Number of Hidden Units) and the **variables which determine how the network is trained**(Eg: Learning Rate). Hyperparameters are set before training. * Number of Hidden Layers * Network Weight Initialization * Activation Function * Learning Rate * Momentum * Number of Epochs * Batch Size

Answer 4

**Network Hyperparameters** * **The number of Hidden Layers**: Many hidden units within a layer with regularization techniques can increase accuracy. Smaller number of units may cause underfitting. * **Network Weight Initializatio**n: Ideally, it may be better to use different weight initialization schemes according to the activation function used on each layer. Mostly uniform distribution is used. * **Activation function**: Activation functions are used to introduce nonlinearity to models, which allows deep learning models to learn nonlinear prediction boundaries. **Training Hyperparameters** * **Learning Rate**: The learning rate defines how quickly a network updates its parameters. Low learning rate slows down the learning process but converges smoothly. Larger learning rate speeds up the learning but may not converge. * **Momentum**: Momentum helps to know the direction of the next step with the knowledge of the previous steps. It helps to prevent oscillations. A typical choice of momentum is between 0.5 to 0.9. * **The number of epochs**: Number of epochs is the number of times the whole training data is shown to the network while training. Increase the number of epochs until the validation accuracy starts decreasing even when training accuracy is increasing(overfitting). * **Batch size**: Mini batch size is the number of sub-samples given to the network after which parameter update happens. A good default for batch size might be 32. Also try 32, 64, 128, 256, and so on.

Answer 5

The reasons for this could be: * The learning is rate is low * Regularization parameter is high * Stuck at local minima

Answer 6

**Long short-term memory**(LSTM) is an artificial recurrent neural network architecture used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections that make it a “general purpose computer”. **It can not only process single data points, but also entire sequences of data.** They are a special kind of Recurrent Neural Networks which are **capable of learning long-term dependencies.**

Answer 7

Perceptron **receives multiple inputs, applies various transformations and functions and provides an output**. A Perceptron is a **linear model used for binary classification**. It models a neuron which has a set of inputs, each of which is given a specific weight. The neuron computes some function on these weighted inputs and gives the output.

Answer 8

1. Initializing the weights and threshold. 2. Provide the input and calculate the output. 3. Update the weights. 4. Repeat Steps 2 and 3 ## Footnote **Wj (t+1) – Updated Weight Wj (t) – Old Weight d – Desired Output y – Actual Output x – Input**

Answer 9

A **multilayer perceptron** (MLP) is a deep, artificial neural network. It is composed of more than one perceptron. They are composed of an input layer to receive the signal, an output layer that makes a decision or prediction about the input, and in between those two, an arbitrary number of hidden layers that are the true computational engine of the MLP.

Answer 10

* **Input Nodes**: The Input nodes provide information from the outside world to the network and are together referred to as the “Input Layer”. No computation is performed in any of the Input nodes – they just pass on the information to the hidden nodes. * **Hidden Nodes**: The Hidden nodes perform computations and transfer information from the input nodes to the output nodes. A collection of hidden nodes forms a “Hidden Layer”. While a network will only have a single input layer and a single output layer, it can have zero or multiple Hidden Layers. * **Output Nodes**: The Output nodes are collectively referred to as the “Output Layer” and are responsible for computations and transferring information from the network to the outside world.

Answer 11

* Single-Layer Perceptrons **cannot classify non-linearly separable data points**. * Complex problems, that involve a **lot of parameters cannot be solved** by Single-Layer Perceptrons

Answer 12

**Autoencoder** is a simple 3-layer neural network where output units are directly connected back to input units. Typically, the number of hidden units is much less than the number of visible ones. The task of training is to minimize an error or reconstruction, i.e. find the most efficient compact representation for input data. **RBM** shares a similar idea, but it **uses stochastic units with particular distribution** instead of deterministic distribution. The task of training is to find out how these two sets of variables are actually connected to each other. One aspect that distinguishes **RBM** from other autoencoders is that it **has two biases**. The **hidden bias helps the RBM produce the activations** on the forward pass, while The **visible layer’s biases help the RBM learn the reconstructions** on the backward pass.

Answer 13

Restricted Boltzmann Machine is an **undirected graphical model** that plays a major role in Deep Learning Framework in recent times. It is an algorithm which is useful for **dimensionality reduction**, classification, regression, collaborative filtering, feature learning, and topic modeling.

Answer 14

**Recurrent Neural Networks** are a type of artificial neural network designed to **recognize patterns in sequences of data, such as text, genomes, handwriting, the spoken word, numerical times series data.** Recurrent Neural Networks **use backpropagation** algorithm for training. Because of their **internal memory**, RNN’s are able to remember important things about the input they received, which **enables them to be very precise in predicting what’s coming next**.

Answer 15

Recurrent Neural Networks **use backpropagation** algorithm for training, but it is applied for every timestamp. It is commonly known as **Back-propagation Through Time** (BTT). There are some issues with Back-propagation such as: * Vanishing Gradient * Exploding Gradient

Answer 16

* It has platform flexibility * It is easily trainable on CPU as well as GPU for distributed computing. * TensorFlow has auto differentiation capabilities * It has advanced support for threads, asynchronous computation, and queue es. * It is a customizable and open source.

Answer 17

Tensors are nothing but a **de-facto for representing the data in deep learning**. They are just **multidimensional arrays**, that allows you to represent data having higher dimensions. In general, Deep Learning you deal with high dimensional data sets where dimensions refer to different features present in the data set.

Answer 18

When we do Back-propagation, the **gradients tend to get smaller and smaller as we keep on moving backward in the Network.** This means that the **neurons in the Earlier layers learn very slowly as compared to the neurons in the later layers in the Hierarchy.** Earlier layers in the Network are important because they are responsible to learn and detecting the simple patterns and are actually the building blocks of our Network. Obviously, if they give improper and inaccurate results, then how can we expect the next layers and the complete Network to perform nicely and produce accurate results. The Training process takes too long and the Prediction Accuracy of the Model will decrease.

Answer 19

Weight initialization is one of the very important steps. A **bad weight initialization can prevent a network from learning** but **good weight initialization helps in giving a quicker convergence and a better overall error**. **Biases can be generally initialized to zero**. The rule for setting the **weights is to be close to zero without being too small**.

Answer 20

For a perceptron, there can be one more input called bias. While the **weights determine the slope of the classifier line**, **bias allows us to shift the line towards left or right.** * Normally bias is treated as another weighted input with the input value x0.

Deep Learning Flashcards

source: https://www.edureka.co/blog/interview-questions/deep-learning-interview-questions/ (45 cards)