Unsupervised Learning
Single Linkage Cluster - SLC
Single Linkage Cluster Run Time
O(n^3)
Issues with SLC
Might end up with wrong clusters, depending on the distance definition
k-means clustering
Configurations (k-means as an optimization problem)
center
P
Scores (k-means as an optimization problem)
How much error you introduce by representing these points as centers
Neighborhood (k-means as an optimization problem)
The neighborhood of a configuration is the set of pairs where you keep the centers the same and you change the partitions or you keep the partitions the same and you move the centers
Properties of k-means clustering
Soft Clustering
Attaches cluster probability to each point instead of specific cluster
Task is to find a hypothesis that maximizes the probability of the data (Maximum likelihood)
Maximum Likelihood Gaussian
The Maximum Likelihood mean of the Gaussian u is the mean of the data
Expectation Maximization
EM properties
Richness (Clustering Properties)
For any assignment C of objects to clusters, there’s some distance matrix D such that P_d returns that clustering C
Scale-invariance (Clustering Properties)
Scaling distances by a positive value doesn’t change the clustering
Consistency (Clustering Properties)
Shrinking intra-cluster distances (moving points towards each other) and expanding inter-cluster (moving clusters away from each other) distances doesn’t change the clustering
Impossibility Theorem (Clustering Properties)
There’s no clustering algorithm that can achieve the three properties
Time to select most relevant subset of features M, where M <= N
2^N
Approaches to Feature Selection
Filtering and Wrapping
Filtering (Feature Selection)
Wrapping
We can use different techniques to search:
1. Randomized Optimization algorithms
2. Forward selection: select a feature and evaluate the effect of creating different combinations of this feature with other features. Stop once you stagnate. This is similar to hill climbing
3. Backward Elimination: start with all the features and evaluate the effect of eliminating each feature
Forward selection
select a feature and evaluate the effect of creating different combinations of this feature with other features. Stop once you stagnate. This is similar to hill climbing
Backward Elimination
start with all the features and evaluate the effect of eliminating each feature
x_i is <> relevant if removing it degrades the Bayes Optimal Classifier
strongly