Name the advantages of the LSTM cell/GRU compared to the Elman cell
Why are RNNs suitable for problems with time series?
They contain hidden states as “short memories” to connect between times
What role does the hidden state play in RNNs?
What are the pros and cons of a typical RNN architecture?
(+) * Possibility of processing input of any length
* Model size not increasing with the size of the input
* Computation takes into account historical information
* Weights are shared across time
(-)* Computation is slow
* Difficulty accessing information from a long time ago
* Cannot consider any future input for the current state
What are some RNN basic architectures? Name three applications where many-to-one and one-to-many RNNs would be beneficial.
1-to-1: Classic feed-forward for image classification
1-to-many: image captioning
many-to-1: sentiment analysis
many-to-many: 1. machine translation 2. video classification
What role does the hidden state play in RNNs?
describe what an element of a batch would be for a recurrent network e.g. by using an example.
an element of a batch represents a sequence of data points.
e.g, in a language modeling task where the input is a sequence of words, an element of a batch would be a sentence or a paragraph.
Why does the required memory space increase with higher batch sizes during training?
What is the difference between BPTT and TBPTT?
What are the main challenges of training RNNs?
what is the problem with deep RNNs?
Give several applications where a recurrent neural network can be useful and explain why.
due to their ability to process sequential data and capture temporal dependencies
- NLP e.g language translation, sentiment analysis, text generation, speech recognition
- Time Series Analysis: analyze historical data trends, forecasting future patterns of stock prices/market demand
- speech and audio processing
- image and video analysis
What is the main idea behind LSTMs?
introduction of gates that control writing and accessing “memory” in additional cell state
What is the role of LSTM cell state?
How is the update of internal states in LSTM unit?
1) Forget gate: Forgetting old information in the cell state
2) Input gate: Deciding on new input for the cell state
3) Computing the updated cell state
4) Computing the updated hidden state
What does the Forget Gate in LSTM do?
controls how much of the previous cell state is forgotten –> compute new cell state + hidden state
What does the input gate in LSTM do?
Deciding on which info from the current inputs(inputs x + hidden state) to store in the new cell/ hidden state
What does the output gate in LSTM do?
deciding on which information from the hidden state to output as the final prediction
What is the main idea of GRU?
A variant of the LSTM unit, but with simpler and fewer parameters
No additional cell state!
→Memory operates only and directly via the hidden state
How does GRU control the flow of information?
In which scenarios would LSTMs be beneficial compared to GRUs?
why are confounds a problem?
What is a confound in the context of ML and statistical analysis?
refers to an extraneous variable that correlates with both the input variables and the outcome variable being studied.
Give an example of a confound problem and non-confound problem
Confound problem:
- learn correlation features due to an imbalance dataset e.g. training task of identifying a tank in an image by using all tank images recorded on cloudy days, all non-tank images on sunny
- noise e.g. speech recordings with 2 microphones in the environment containing confounds such as sensor, lighting, age/sex of the participants, temperature, …
Non-confound problem:
recognize handwritten digits from 0 to 9 (like in the MNIST dataset). Each digit is represented equally across various writing styles, orientations, and thicknesses.