RNN Sequence Modeling
Using Recurrent Neural Networks for next-word prediction -- learning sequential patterns from text data.
What is an RNN?
A Recurrent Neural Network processes sequential data step-by-step, using a hidden state that carries information from previous steps. Unlike feedforward networks, RNNs have a "memory" that makes them suitable for text, speech, and time-series data where order matters.
Key idea: RNNs process data one step at a time, using the previous step's output (hidden state) as context for the next step.
How RNN Works
Step-by-Step Process
- Input sequence is processed one token at a time
- Hidden state is updated at each step, storing information about past inputs
- Recurrent connection passes the previous hidden state as input to the next step
- Output is produced after processing all steps
Input Sequence: "hello" "how" "are"
x1 x2 x3
| | |
Embedding Embedding Embedding
| | |
Hidden States: h0 --------> h1 --------> h2 --------> h3
|
Dense + Softmax
|
Predicted next word
Code: Prepare Text Data
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense
# Example text corpus
corpus = [
"hello how are you",
"hello how is your day",
"hello how are your friends",
"hello what are you doing"
]
# Tokenize words
tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)
total_words = len(tokenizer.word_index) + 1
print("Total unique words:", total_words)
# Create input sequences (n-grams)
input_sequences = []
for line in corpus:
token_list = tokenizer.texts_to_sequences([line])[0]
for i in range(1, len(token_list)):
n_gram_sequence = token_list[:i+1]
input_sequences.append(n_gram_sequence)
# Pad sequences to same length
max_seq_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_seq_len, padding='pre'))
# Split inputs and labels
X = input_sequences[:,:-1]
y = input_sequences[:,-1]
print("Example X[0]:", X[0], "-> y[0]:", y[0])
Code: Build and Train RNN
# Build RNN Model
model = Sequential()
# Embedding layer: converts word indices into dense vectors
model.add(Embedding(input_dim=total_words, output_dim=10, input_length=max_seq_len-1))
# RNN layer: learns sequence patterns
model.add(SimpleRNN(50, activation='relu'))
# Output layer: predicts next word
model.add(Dense(total_words, activation='softmax'))
# Compile the model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X, y, epochs=200, verbose=0)
print("Training complete!")
Code: Predict Next Word
def predict_next_word(model, tokenizer, text_seq, max_seq_len):
token_list = tokenizer.texts_to_sequences([text_seq])[0]
token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre')
predicted = model.predict(token_list, verbose=0)
predicted_word_index = np.argmax(predicted)
for word, index in tokenizer.word_index.items():
if index == predicted_word_index:
return word
# Test the model
seed_text = "hello how is"
next_word = predict_next_word(model, tokenizer, seed_text, max_seq_len)
print(f"Input: '{seed_text}' -> Predicted next word: '{next_word}'")
Key Takeaways
- Embedding layer converts word indices into dense vector representations
- SimpleRNN processes the sequence step-by-step, updating hidden state
- Dense + softmax outputs probability distribution over all words
- Loss: sparse_categorical_crossentropy (integer labels)
- 200 epochs needed because the dataset is tiny
When to Use RNNs
| Good For | Not Ideal For |
| Short sequences | Long sequences (use LSTM instead) |
| Simple text generation | Image data (use CNNs) |
| Basic time-series | When parallelization needed (use Transformers) |
Standard RNNs suffer from vanishing gradients -- they forget information from early steps in long sequences. For anything beyond short sequences, use LSTM or GRU instead.
RNNSimpleRNNEmbeddingNext-Word PredictionKeras