ML Playground / RNN Sequence Modeling View Notebook

RNN Sequence Modeling

Using Recurrent Neural Networks for next-word prediction -- learning sequential patterns from text data.

What is an RNN?

A Recurrent Neural Network processes sequential data step-by-step, using a hidden state that carries information from previous steps. Unlike feedforward networks, RNNs have a "memory" that makes them suitable for text, speech, and time-series data where order matters.

Key idea: RNNs process data one step at a time, using the previous step's output (hidden state) as context for the next step.

How RNN Works

Step-by-Step Process
  1. Input sequence is processed one token at a time
  2. Hidden state is updated at each step, storing information about past inputs
  3. Recurrent connection passes the previous hidden state as input to the next step
  4. Output is produced after processing all steps
Input Sequence: "hello" "how" "are" x1 x2 x3 | | | Embedding Embedding Embedding | | | Hidden States: h0 --------> h1 --------> h2 --------> h3 | Dense + Softmax | Predicted next word

Code: Prepare Text Data

import numpy as np import tensorflow as tf from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Embedding, SimpleRNN, Dense # Example text corpus corpus = [ "hello how are you", "hello how is your day", "hello how are your friends", "hello what are you doing" ] # Tokenize words tokenizer = Tokenizer() tokenizer.fit_on_texts(corpus) total_words = len(tokenizer.word_index) + 1 print("Total unique words:", total_words) # Create input sequences (n-grams) input_sequences = [] for line in corpus: token_list = tokenizer.texts_to_sequences([line])[0] for i in range(1, len(token_list)): n_gram_sequence = token_list[:i+1] input_sequences.append(n_gram_sequence) # Pad sequences to same length max_seq_len = max([len(x) for x in input_sequences]) input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_seq_len, padding='pre')) # Split inputs and labels X = input_sequences[:,:-1] y = input_sequences[:,-1] print("Example X[0]:", X[0], "-> y[0]:", y[0])

Code: Build and Train RNN

# Build RNN Model model = Sequential() # Embedding layer: converts word indices into dense vectors model.add(Embedding(input_dim=total_words, output_dim=10, input_length=max_seq_len-1)) # RNN layer: learns sequence patterns model.add(SimpleRNN(50, activation='relu')) # Output layer: predicts next word model.add(Dense(total_words, activation='softmax')) # Compile the model model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy']) history = model.fit(X, y, epochs=200, verbose=0) print("Training complete!")

Code: Predict Next Word

def predict_next_word(model, tokenizer, text_seq, max_seq_len): token_list = tokenizer.texts_to_sequences([text_seq])[0] token_list = pad_sequences([token_list], maxlen=max_seq_len-1, padding='pre') predicted = model.predict(token_list, verbose=0) predicted_word_index = np.argmax(predicted) for word, index in tokenizer.word_index.items(): if index == predicted_word_index: return word # Test the model seed_text = "hello how is" next_word = predict_next_word(model, tokenizer, seed_text, max_seq_len) print(f"Input: '{seed_text}' -> Predicted next word: '{next_word}'")

Key Takeaways

When to Use RNNs

Good ForNot Ideal For
Short sequencesLong sequences (use LSTM instead)
Simple text generationImage data (use CNNs)
Basic time-seriesWhen parallelization needed (use Transformers)

Standard RNNs suffer from vanishing gradients -- they forget information from early steps in long sequences. For anything beyond short sequences, use LSTM or GRU instead.

RNNSimpleRNNEmbeddingNext-Word PredictionKeras