Given an input dataset and the Apriori algorithm, how to trace the algorithm for intermediate results? (Review)

How to derive strong rules from the given frequent itemsets L and a conf_rate?
Test for strong rules by filtering the rules with conf < min_conf.

How to improve the efficiency of the rule generation procedure by applying the apriori property?
Pruning while generating rules.

What are the two general purposes of DM, use some examples of mined association patterns to explain for each purpose?
How can the association mining process be mapped to the empirical cycle model of scientific research?

Why classification mining is a supervised learning process? How about association mining?
What are the major phases of conducting a classification mining application?
Can you describe a mapping between a classification application process and the empirical cycle?

What is the general idea/strategy/method/alorithm of DT induction for classification mining?

What is the general strategy of Inductive Learning (via observing examples)?
What are the major technical issues of DT Induction approach for classification mining?
What is the heuristic function used in ID3 algorithm for evaluating search directions?
Entropy Calculation
What is the notion of Information Gain, and how it is applied in ID3 algorithm?
How to convert the ID3 algorithm into an implementation code structure?

How to quantify information contained in a message?
Q(message) = P2(outcome_after) - P1(outcome_before)
Q, the quantity of information contained in a message.
P1, the probability of outcome before receiving a message.
P2, the probability of outcome after receiving the message.
Suppose a missing cow has strayed into a pasture represented as an
8 x 8 array of “cells”.
Question: Where is the cow?
Outcome: the probability of findng the cow.
Answer 1: Nobody knows.
Answer 2: The cow is in cell (4, 7).
What is the information received?
Outcome1 (cow before) = 1/64
Outcome2 (cow after = 1
Information received = log2 P2 - log2 P1
= log2 (P2/P1)
= log2 (1 / (1/64)
= log2 (64)
= 6 bits
What is the message and information received formulas?
How the concept of 1 can be applied to a classification method, such as ID3 algorithm?
[????]
What is entropy and information gain? How to use information gain for choosing an attribute?
What is ID3’s induction bias?
There is a natural bias in the information gain measure that favors attributes with many distinct values over those with few distinct values
What is over fitting?
Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations.
What are the technical options of overcoming the bias problem for possible improvement?
How different a classification task is done by DT induction and by Naïve Bayes classifier? (*Give 3 differences.)
[Need review]
What are the two assumptions for using NB classifier?