RNN Sequence Modeling

Using Recurrent Neural Networks for next-word prediction -- learning sequential patterns from text data.

What is an RNN?

A Recurrent Neural Network processes sequential data step-by-step, using a hidden state that carries information from previous steps. Unlike feedforward networks, RNNs have a "memory" that makes them suitable for text, speech, and time-series data where order matters.

Key idea: RNNs process data one step at a time, using the previous step's output (hidden state) as context for the next step.

How RNN Works

Step-by-Step Process

Input sequence is processed one token at a time
Hidden state is updated at each step, storing information about past inputs
Recurrent connection passes the previous hidden state as input to the next step
Output is produced after processing all steps

Code: Prepare Text Data

import numpy as np import tensorflow as tf from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Embedding, SimpleRNN, Dense # Example text corpus corpus = [ "hello how are you", "hello how is your day", "hello how are your friends", "hello what are you doing" ] # Tokenize words tokenizer = Tokenizer() tokenizer.fit_on_texts(corpus) total_words = len(tokenizer.word_index) + 1 print("Total unique words:", total_words) # Create input sequences (n-grams) input_sequences = [] for line in corpus: token_list = tokenizer.texts_to_sequences([line])[0] for i in range(1, len(token_list)): n_gram_sequence = token_list[:i+1] input_sequences.append(n_gram_sequence) # Pad sequences to same length max_seq_len = max([len(x) for x in input_sequences]) input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_seq_len, padding='pre')) # Split inputs and labels X = input_sequences[:,:-1] y = input_sequences[:,-1] print("Example X[0]:", X[0], "-> y[0]:", y[0])

Code: Build and Train RNN

# Build RNN Model model = Sequential() # Embedding layer: converts word indices into dense vectors model.add(Embedding(input_dim=total_words, output_dim=10, input_length=max_seq_len-1)) # RNN layer: learns sequence patterns model.add(SimpleRNN(50, activation='relu')) # Output layer: predicts next word model.add(Dense(total_words, activation='softmax')) # Compile the model model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy']) history = model.fit(X, y, epochs=200, verbose=0) print("Training complete!")

Code: Predict Next Word

def predict_next_word(model, tokenizer, text_seq, max_seq_len): token_list = tokenizer.texts_to_sequences([text_seq])[0] token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre') predicted = model.predict(token_list, verbose=0) predicted_word_index = np.argmax(predicted) for word, index in tokenizer.word_index.items(): if index == predicted_word_index: return word # Test the model seed_text = "hello how is" next_word = predict_next_word(model, tokenizer, seed_text, max_seq_len) print(f"Input: '{seed_text}' -> Predicted next word: '{next_word}'")

Key Takeaways

Embedding layer converts word indices into dense vector representations
SimpleRNN processes the sequence step-by-step, updating hidden state
Dense + softmax outputs probability distribution over all words
Loss: sparse_categorical_crossentropy (integer labels)
200 epochs needed because the dataset is tiny

When to Use RNNs

Good For	Not Ideal For
Short sequences	Long sequences (use LSTM instead)
Simple text generation	Image data (use CNNs)
Basic time-series	When parallelization needed (use Transformers)

Standard RNNs suffer from vanishing gradients -- they forget information from early steps in long sequences. For anything beyond short sequences, use LSTM or GRU instead.

RNNSimpleRNNEmbeddingNext-Word PredictionKeras