given a collection of records, each record by characterized by a tuple (x,y), where x is the attribute set and y is the label set.
Classification
6 CLASSIFICATION TECHNIQUES
DECISION TREE: INDUCTION:
Training set is being inducted to train a model (Learning Algorithm -> Learn Model),
the model will be able to form a decision tree,
and we can apply the model to deduct from the test set.
is a type of algorithm that uses attributes to split the data recursively, till each split contains only a single class.
Hunt’s Algorithm
4 TYPES OF ATTRIBUTES
2 TEST CONDITION FOR NOMINAL ATTRIBUTE
2 TEST CONDITION FOR ORDINAL ATTRIBUTE
is an approach to getting the best split where nodes with homogenous class distribution are preferred.
Greedy Approach
Formula for General Framework when finding the best split.
M0 is the value of the parent.
M12 is Node 1 * Node 2
M34 is Node 3 * Node 4
Gain = M0 - M12 VS M0 - M34
3 WAYS TO MESURE NODE IMPURITY
is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.
Gini Impurity / Index
measures homogeneity of a node uncertainty of a random variable or information content of a message.
Entropy
measures misclassification made by a node.
Classification Erro
3 STOPPING CRITERIA FOR TREE INDUCTION
4 ADVANTAGES OF DEVISION TREE BASED CLASSIFICATION