PCA (Principal Components Analysis)
k-means Clustering
Hierarchal Clustering
What is a principal component? Its features?
Loadings
Like multipliers for each predictor (?)
First principal component
How are values for the first principal component loadings determined?
By maximizing the sample variance of the first principal component
Second principal component
Linear combination of features that maximizes the remaining variability in the dataset (not captured by the 1st principal component)
What is the dot product of the loading vectors for PC1 and PC2? Why?
0, because they are orthogonal
How can we solve for loading vectors?
Eigen decomposition (not tested) of the covariance matrix
- This produces eigenvalues (variances of each PC)
and eigenvectors (loading vectors).
What is the max number of distinct principal components that can be created?
For a dataset with n observations and p features, the max number of PCs is
min(n-1,p)
Distinct principal component
Distinct if the variance is non-zero (means adding this new component still helps to capture some of the variance in the dataset).
Biplot
Why is scaling necessary in PCA?
Does PCR perform feature selection?
NO. PCR does not perform feature selection. All variables are used in producing the PCs
The first principal component: (2)
How does PCA reduce dimensionality?
By involving linear transformations
What happens if the # of PCs = the # of original variables?
Data approximation is exact (think all variables being used in some way, 100% of the variability is explained)
T/F PCA most useful for data with strong NON-linear relationships
FALSE, most suitable for linear, it’s a linear technique
The sum of the scores of each PC must be:
0 (NOT 1)
Because data centred (mean is) around 0, so +/- deviations cancel out
PC loading vectors
PC scores
Loading vectors: DIRECTIONS in space along which the data vary the most
Scores: PROJECTIONS along the directions
K-means clustering: at each iteration how does the number of clusters change?
Either same or less clusters