Associative rule mining
is an unsupervised learning technique used to discover interesting relationships between items in large datasets, particularly in market basket analysis.
Mining Frequent Itemsets
Finding combinations of items that occur together frequently in transactions
The Apriori Algorithm
Principle: If an itemset is frequent, all its subsets must also be frequent (anti-monotone property)
Apriori algorithm steps
How it works (Step-by-step):
Find frequent individual items
Count how often each item appears.
Keep only those that meet the minimum support threshold.
Generate larger itemsets
Combine frequent items to form 2-item sets, 3-item sets, and so on.
Keep only those that are still frequent (meet support threshold).
Stop when no larger frequent sets can be made.
Generate rules from frequent itemsets
For each frequent itemset, create rules like A → B.
Keep only the rules with high confidence (above the threshold).
Confidence
Confidence measures how often the rule is true. eg Given that someone buys beer, how often do they also buy wine?
Drawback of confidence
Confidence can be misleading because it doesn’t consider how common the right-hand side item already is. For example, even if P(Coffee | Tea) = 0.75, but P(Coffee) = 0.9, buying tea actually reduces the likelihood of buying coffee.
what are closed and maximal sets?
from these transactions which are closed and maximal
An itemset is closed if no proper superset has the same support.
An itemset is maximal if it is frequent and has no frequent superset.
frequent?
Frequent = support ≥ min_sup