8 - RNN and LSTM

ucla | CS 162 | 2024-02-06 13:16

RNNs
- Long-Term Dependency Limitation
LSTMs
Learning Neural LMs (Recap)

RNNs

use sequential information to make preds
don’t make independence assumption (unmlike FFNNs)
perform same task at each step of a seq and inputs of next step req prev step outputs -> memory
this is better than n-gram because we can generate probs conditioned on the WHOLE context $[1:t-1]$ while n-grams are usually constrained b/w $[t-n+1:t-1]$
Long-Term Dependency Limitation
RNNs are theoretically capable of handling long term memory that generates outputs that are dependent on tokens much much earlier in a long context
but irl they are not do to vanishing gradients of context from very far back in the context
LSTMs
designed to mitigate long-term dependency issue w/ RNNs
the key is memory cell state that add or remove info as seq progresses by changing cascading multiplications to additions of probs
this is done using 3 gates to control memory (input, forget, and NS/FS)
NOTE: different weights across each gate (look at subscript)
Input Gate
decides what info from current input to capture in cell state
consists of 2 parts
- sigmoid (input gate layer) - decides what values to update b/w/ 0 to 1
- tanh - creates a vector of new candidate values (contextualized seqs) $\tilde C_t$
e.g., adding gender context of new subject to cell state and replace old context
Forget Gate
decides what info to remove through sigmoid layer
looks at prev hidden state $h_{t-1}$ and current input $x_t$ and outptus vec bw 0 to 1 (1 = keep completely and v.v.)
e.g., forget gender of old subject when we see new subject
Next Step Context (Cell State)
update old state $C_{t-1}$ into new cell state
multiply old state by the forget gate $f_t$
add in input gate $i_t *\tilde C_t$ (element-wise multiplication)
Output Gate
decide outputs by computing output gate then multiply into cell state to get outputs and propagate to next cell
Complete Structure
add bias to each of the below params

8 - RNN and LSTM

Table of Contents

RNNs

Long-Term Dependency Limitation

LSTMs

Input Gate

Forget Gate

Next Step Context (Cell State)

Output Gate

Complete Structure

Learning Neural LMs (Recap)