Association Rule Mining

An unsupervised technique to discover interesting relationships (rules) between items in transactional datasets.

What It Is

Association Rule Mining finds patterns like "If a customer buys X, they are likely to buy Y." It works on transactional data (shopping baskets, web clicks, symptoms) and produces rules evaluated by support, confidence, and lift.

The classic application is Market Basket Analysis: "Customers who bought bread and butter also bought milk." Amazon's "Frequently Bought Together" feature is powered by this.

Key Metrics

Rule Evaluation Metrics

Support = frequency of itemset in all transactions = count(A and B) / total_transactions Confidence = how often the rule is correct = count(A and B) / count(A) Lift = strength compared to random chance = confidence / support(B) Lift > 1 = positive association Lift = 1 = independent Lift < 1 = negative association

Example

Transactions: T1: Milk, Bread, Butter T2: Bread, Butter T3: Milk, Bread T4: Milk, Bread, Butter Rule: {Milk, Bread} -> {Butter} Support = 2/4 = 50% Confidence = 2/3 = 66.7% Lift = 0.667 / 0.75 = 0.89

Algorithms

Apriori

Generates frequent itemsets level by level (breadth-first). Uses the property: "if an itemset is infrequent, all its supersets are infrequent." Simple but slow on large data.

FP-Growth

Builds a compressed tree (FP-tree) of transactions. Much faster than Apriori because it avoids candidate generation. Used in Apache Spark MLlib.

Eclat

Uses vertical data format (item -> transaction IDs). Finds frequent itemsets via set intersections. Faster than Apriori for dense datasets.

When to Use Which

Small data: Apriori. Large data: FP-Growth. Dense data with few items: Eclat.

Code: Apriori with mlxtend

import pandas as pd from mlxtend.frequent_patterns import apriori, association_rules from mlxtend.preprocessing import TransactionEncoder # Sample transactions dataset = [ ['Milk', 'Bread', 'Butter'], ['Bread', 'Butter'], ['Milk', 'Bread'], ['Milk', 'Bread', 'Butter'] ] # Convert to one-hot encoded DataFrame te = TransactionEncoder() te_ary = te.fit(dataset).transform(dataset) df = pd.DataFrame(te_ary, columns=te.columns_) print(df)

Find Frequent Itemsets

# Find frequent itemsets with minimum support of 50% frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True) print(frequent_itemsets)

Generate Association Rules

# Generate rules with minimum confidence of 50% rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5) print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

Real-World Applications

Retail — Market Basket Analysis (Amazon, Walmart product placement)
Recommendations — "Users who listened to X also liked Y"
Web Mining — Which pages are visited together
Healthcare — Which symptoms appear together (disease diagnosis)

When to Use Association Rules

Good For	Not Ideal For
Transactional / basket data	Continuous numeric data
Product recommendation systems	Small number of transactions
Finding co-occurrence patterns	Causal inference (correlation != causation)
Cross-selling and upselling strategies	Real-time predictions

Association rules find correlations, not causation. A rule like {diapers} -> {beer} means they are bought together, not that buying diapers causes someone to buy beer. Also, low min_support can produce an explosion of rules; start with higher thresholds and lower them gradually.

Unsupervised Association Rules Market Basket Apriori mlxtend