Association Rule Mining
An unsupervised technique to discover interesting relationships (rules) between items in transactional datasets.
What It Is
Association Rule Mining finds patterns like "If a customer buys X, they are likely to buy Y." It works on transactional data (shopping baskets, web clicks, symptoms) and produces rules evaluated by support, confidence, and lift.
The classic application is Market Basket Analysis: "Customers who bought bread and butter also bought milk." Amazon's "Frequently Bought Together" feature is powered by this.
Key Metrics
Rule Evaluation Metrics
Support = frequency of itemset in all transactions
= count(A and B) / total_transactions
Confidence = how often the rule is correct
= count(A and B) / count(A)
Lift = strength compared to random chance
= confidence / support(B)
Lift > 1 = positive association
Lift = 1 = independent
Lift < 1 = negative association
Example
Transactions:
T1: Milk, Bread, Butter
T2: Bread, Butter
T3: Milk, Bread
T4: Milk, Bread, Butter
Rule: {Milk, Bread} -> {Butter}
Support = 2/4 = 50%
Confidence = 2/3 = 66.7%
Lift = 0.667 / 0.75 = 0.89
Algorithms
Apriori
Generates frequent itemsets level by level (breadth-first). Uses the property: "if an itemset is infrequent, all its supersets are infrequent." Simple but slow on large data.
FP-Growth
Builds a compressed tree (FP-tree) of transactions. Much faster than Apriori because it avoids candidate generation. Used in Apache Spark MLlib.
Eclat
Uses vertical data format (item -> transaction IDs). Finds frequent itemsets via set intersections. Faster than Apriori for dense datasets.
When to Use Which
Small data: Apriori. Large data: FP-Growth. Dense data with few items: Eclat.
Code: Apriori with mlxtend
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder
# Sample transactions
dataset = [
['Milk', 'Bread', 'Butter'],
['Bread', 'Butter'],
['Milk', 'Bread'],
['Milk', 'Bread', 'Butter']
]
# Convert to one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
print(df)
Find Frequent Itemsets
# Find frequent itemsets with minimum support of 50%
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
print(frequent_itemsets)
Generate Association Rules
# Generate rules with minimum confidence of 50%
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])
Real-World Applications
- Retail — Market Basket Analysis (Amazon, Walmart product placement)
- Recommendations — "Users who listened to X also liked Y"
- Web Mining — Which pages are visited together
- Healthcare — Which symptoms appear together (disease diagnosis)
When to Use Association Rules
| Good For | Not Ideal For |
| Transactional / basket data | Continuous numeric data |
| Product recommendation systems | Small number of transactions |
| Finding co-occurrence patterns | Causal inference (correlation != causation) |
| Cross-selling and upselling strategies | Real-time predictions |
Association rules find correlations, not causation. A rule like {diapers} -> {beer} means they are bought together, not that buying diapers causes someone to buy beer. Also, low min_support can produce an explosion of rules; start with higher thresholds and lower them gradually.
Unsupervised Association Rules Market Basket Apriori mlxtend