LSTM Time Series Forecasting
Using a stacked LSTM model to predict future sales values from historical time-series data.
What It Is
This notebook applies LSTM to a regression task: predicting the next value in a time series. Instead of text tokens, the input is a sliding window of past numeric values, and the output is the next value in the sequence.
Time-series forecasting with LSTM: use a lookback window of past values to predict the next value. The model learns temporal patterns like trends and seasonality.
How It Works
Pipeline Steps
- Generate data -- Create synthetic sales data with a trend
- Normalize -- Scale values to [0, 1] using MinMaxScaler
- Create sequences -- Sliding window of 5 past values predicts the next value
- Reshape for LSTM -- Input shape: (samples, timesteps, features)
- Train stacked LSTM -- Two LSTM layers (50 and 25 units)
- Predict and inverse-transform -- Convert predictions back to original scale
Code: Generate and Prepare Data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
# Create a simple sales dataset (hypothetical)
np.random.seed(42)
data = np.cumsum(np.random.randn(100, 1)) + 100 # Cumulative sum for trend
# Convert to DataFrame
df = pd.DataFrame(data, columns=['Sales'])
df.plot(title="Sales Over Time", figsize=(10, 5))
plt.show()
Code: Create Sequences and Split
# Create sequences (lookback window of 5)
def create_sequences(data, lookback=5):
X, y = [], []
for i in range(len(data) - lookback):
X.append(data[i:i+lookback])
y.append(data[i+lookback])
return np.array(X), np.array(y)
# Normalize Data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(df)
# Create sequences
X, y = create_sequences(scaled_data)
# Split into Train and Test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
# Reshape for LSTM (samples, timesteps, features)
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
Code: Build and Train Model
# Define LSTM Model
model = Sequential([
LSTM(50, activation='relu', return_sequences=True, input_shape=(X_train.shape[1], 1)),
LSTM(25, activation='relu', return_sequences=False),
Dense(1) # Output Layer (regression -- no activation)
])
# Compile Model
model.compile(optimizer='adam', loss='mse')
# Train Model
history = model.fit(X_train, y_train, epochs=50, batch_size=8,
validation_data=(X_test, y_test), verbose=1)
Key Architecture Details
- return_sequences=True on the first LSTM: outputs a sequence (needed to stack LSTM layers)
- return_sequences=False on the second LSTM: outputs only the last hidden state
- Dense(1) with no activation: linear output for regression
- Loss: MSE (Mean Squared Error) -- standard for regression tasks
Code: Predict and Visualize
# Plot Training Loss
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.legend()
plt.title("Loss Over Epochs")
plt.show()
# Predict Test Data
y_pred = model.predict(X_test)
# Inverse Transform to Original Scale
y_pred_inv = scaler.inverse_transform(y_pred)
y_test_inv = scaler.inverse_transform(y_test.reshape(-1, 1))
# Plot Predictions vs Actual
plt.figure(figsize=(10, 5))
plt.plot(y_test_inv, label="Actual Sales", marker='o')
plt.plot(y_pred_inv, label="Predicted Sales", marker='x')
plt.legend()
plt.title("LSTM Predictions vs Actual Sales")
plt.show()
Always normalize your data before feeding it to LSTM. Neural networks train much better with values in [0,1] range. Remember to inverse-transform predictions before interpreting results.
When to Use LSTM for Time Series
| Good For | Not Ideal For |
| Data with temporal dependencies | Stationary data (ARIMA may suffice) |
| Non-linear trends | Very short sequences |
| Multi-step forecasting | When interpretability is needed |
| Multivariate time series | Small datasets (overfitting risk) |
LSTMTime SeriesForecastingMinMaxScalerStacked LSTM