Purpose of PCA
( Principal Component Analysis is ordered )
-> finds a sequence of linear combinations of the variables that have maximal variance, are mutually uncorrelated
Purpose of clustering
Discovering unknown sub-groups (homogenous clusters) in data
Unsupervised Learning methods
PCA
Clustering
( Data observation and low-complexity data description )
SVM vs. Logistic Regression for (almost) separable classes
SVM
SVM vs. Logistic Regression for non-separable classes
SIMILAR
SVM and Logistic regression (with ridge penalty)
SVM vs. Logistic Regression for estimating probabilites
Logistic regression
SVM vs. Logistic Regression for fast and interpretable model
Logistic regression
SVM vs. Logistic Regression for non-linear boundaries
kernel SVM’s
( kernel Logistic regression expensive )
K-means clustering partition requirements
(no-overlap)
Clustering potential issues
Within-Cluster-Variation formula
-
( usually a cumulative value is reported )
K-means clustering Vorgehensweise
Hierarchical clustering Vorgehensweise
(benefit is
Linkage / Dissimilarity measures
Vorgehensweise of Hyperplane / SVM
Among all separating hyperplanes, find the one that makes the biggest gap or margin between the two classes == Maximal Margin Classifier
If not possible:
loosen „separate“ requirement (slack variables)
enlarge the feature space so that separation becomes possible (e.g. feature expansion woth transformed variables -> non-linear boundaries)
Popular kernel functions