Explain the steps of K-Means Clustering
How does the initial placement of centroids impact the clustering?
Initial centroids impact quality and speed, you can select them at random or maximize distance across initial centroids
How do you evaluate clustering?
When do we want to perform dimensionality reduction? What are the two options?
When the dataset has many dimensions impacting the ability to conduct EDA.
Option 1: Attribute selection - keeping only some of the most informative ones and dropping others
Option 2: Dimensionality reduction: merge redundant ones together into a lower dimensional space
Explain What is PCA?
Principal Component Analysis: A projection of high-dimensional space to a lower dimensional space (using matrix decomposition)
What are the steps of conducting PCA?